Hacker News

>Over the last few days, I’ve been scraping everywhere I can think of, collating the links I can find out in the wild, and compiling my own database of links1 – and importantly, the URLs they redirect to. So far, I’ve found 12,000 links from scraping:

>Google (using their web search API)

>GitHub (using their API)

>Our own (somewhat limited) web logs

>The archive.org Stack Overflow data dumps

>Archive.org’s own list of archived webpages

You're an angel Matt