Hacker News

> For example, if AI scraper bots are running up your bandwidth bill or server load, shouldn't you be able to stop them? I would argue yes

I also say yes, but this is not because of a lack of authorization; it is because of excessive server load (which is what you describe).

Allowing other public mirrors of files would be one thing that can be helpful (providing archive files might also sometimes be useful), although that does not actually prevent excessive scraping, due to their bad working (which is also what you describe).

Some people may use Cloudflare, but Cloudflare has its own problems with it; a lot of legitimate accessing is also stopped, while not necessarily preventing all illegitimate accessing, and sometimes causing additional problems (sometimes this might be due to misconfiguration, but not necessarily always).

> These AI bots will ignore your robots.txt, they'll change user agents if you start to block their user agents, they'll use different IP subnets if you start to block IP subnets

In my experience they change user agents and IP subnets whether or not you block them, and regardless of what else you might do.