> Web pages aren’t digitally signed, aren’t necessarily indexed by search engines
Neither of these prevent scraping, and the lack of the first one actually makes it worse because every scraper has to go to the original server and bog it down instead of getting it from anyone with a copy of the data that they can verify using the signature.
> there are ways to block bots with things like captchas
These don't work if you have anything resembling high value content, because AI can solve them now or do the same proof of work as a real user when all they need is to get a few hundred articles once. If they want it enough they can also pay someone in a low income country to download them manually. Fundamentally if you post something that any human can access then someone can copy it. Public is public.
And if the content is the equivalent of blog comment posts, they can probably still get it, but in that case why even care if they do? Notice that this is the same thing that happens on the centralized services, e.g. Facebook uses your Facebook posts to train AI.