I second this. My website exposes a cgit and 99% of the traffic now is AI scraping the sources, but the load is nowhere near DoS territory. And this is running on the cheapest VPS I could find.
Not saying I'm not annoyed by the scraping; I am looking to block them, but I'm also not going to put the site behind the gatekeeper. If anything, Cloudflare must love AI scraping now for the same reason AV companies love malware.
Now, if you are running a PHP stack...yeah, maybe that's the problem right there.
Is there actually any plausible theory why "AI" would repeatedly scrape the same sites? Are there that many competing, completely independent AI labs? Is it cheaper to repeatedly scrape than to buffer the scraped data locally? (I find it very hard to imagine that it's easier to deal with changing/disappearing content than it is to stand up such a cache.)