The question is who runs them? There are only a few big companies like MS, Google, OpenAI, Anthropic. But from the posts here it looks like hordes of buggy scrapers run by enthusiasts.

Lots of “data” companies out there that want to sell you scraped data sets.

Ad companies, even the small ones, "Brand Protection" companies, IP lawyers looking for images that were used without license, Brand Marketing companies, where it matters also your competitors etc etc