Hacker News

hsbauauvhabzb 6 hours ago [ - ]

Does mass scraping need google for content discovery? Surely most sites contain a site map or index that would effectively self enumerate once you know the domain, which is more often than not publicly disclosed?

rvz 2 hours ago [ - ]

What matters is when websites put this new version of reCAPTCHA on their site, just like archive.is has done. Then the scrapers will have a hard time getting around that.