Hacker News

Ironic part ... LLM are very good as solving CAPTCHA's. So the only people bothered by those same CAPTCHA's are the actual site visitors.

What sites need to do is temp block repeat request from the same IPs. Sure, some agents use 10.000's of IP's but if they are really so aggressive as people state, your going to run into the same IP's way more often then normal users.

That will kick out the over aggressive guys. I have done web scraping and limited it to around 1r/s. You never run into any blocking or detection that way because you hardly show up. But when you have some *** that send 1000's off parallel request down a website, because they never figured out query builders for large page hits. And do not know how to build checks to see from last-update pages.

One of the main issues i see, is some people simply write the most basic of basic scrapers. See link, follow, spawn process, scrap, see 100 more links ... Updates? Just rescrap website, repeat, repeat... Because it takes time to make a scrap template for each website, that knows where to check for updated. So some never bother.