You'd think they would have an interest in developing reasonable crawling infrastructure, like Google, Bing or Yandex. Instead they go all in on hosts with no metering. All of the search majors reduce their crawl rate as request times increase.
On one hand these companies announce themselves as sophisticated, futuristic and highly-valued, on the other hand we see rampant incompetence, to the point that webmasters everywhere are debating the best course of action.
I suspect it's because they're dealing with such unbelievable levels of bandwidth and compute for training and inference that the amount required to blast the entire web like this barely registers to them.
Honestly it's just tragedy of the commons. Why put the effort in when you don't have to identify yourself, just crawl and if you get blocked move the job to another server.
At this point I'm blocking several ASNs. Most are cloud provider related, but there are also some repurposed consumer ASNs coming out of the PRC. Long term, this devalues the offerings of those cloud providers, as prospective customers will not be able to use them for crawling.
This is the correct solution and is how network abuse has been dealt with before the latest fad. Network operators can either police their own users or be blocked/throttled wholesale. There isn't anything more needed except for the willingness to apply measures to networks that are "too big to fail".
They vibe code their crawlers.