Hacker News

Looks like it's time for in-browser scrappers. They will be indistinguishable from the servers side. With AI driver can pass even human tests.

overfeed 2 days ago [ - ]

> Looks like it's time for in-browser scrappers.

If scrapers were as well-behaved as humans, website operators wouldn't bother to block them[1]. It's the abuse that motivates the animus and action. As the fine articles spelt out, scrapers are greedy in many ways, one of which is trying to slurp down as many URLs as possible without wasting bytes. Not enough people know about common crawl, or know how to write multithreaded scrapers with high utilization across domains without suffocating any single one. If your scraper is URL FIFO or stack in a loop, you're just DOSing one domain at a time.

1. The most successful scrapers avoid standing out in any way

Mars008 2 days ago [ - ]

The question is who runs them? There are only a few big companies like MS, Google, OpenAI, Anthropic. But from the posts here it looks like hordes of buggy scrapers run by enthusiasts.

iamacyborg 2 days ago [ - ]

Lots of “data” companies out there that want to sell you scraped data sets.

luckylion 2 days ago [ - ]

Ad companies, even the small ones, "Brand Protection" companies, IP lawyers looking for images that were used without license, Brand Marketing companies, where it matters also your competitors etc etc

bartread 2 days ago [ - ]

Not a new idea. For years now, on the occasions I’ve needed to scrape, I’ve used a set of ViolentMonkey scripts. I’ve even considered creating an extension, but have never really needed it enough to do the extra work.

But this is why lots of sites implement captchas and other mechanisms to detect, frustrate, or trap automated activity - because plenty of bots run in browsers too.

eur0pa 2 days ago [ - ]

you mean OpenAI Atlas?