How is it available for everyone if the AI bots bring down your server?

Is that really the problem we are discussing? I've had people attack my server and bring it down. But that has nothing to do with being free and open to everyone. A top hacker news post could take my server.

Yes, because a top hacker news post takes your server down because a large number of actual humans are looking to gain actual value from your posts. Meanwhile, you stand to benefit from the HN discussion by learning new things and perspectives from the community.

The AI bot assault, on the other hand, is one company (or a few companies) re-fetching the same data over and over again, constantly, in perpetuity, just in case it's changed, all so they can incorporate it into their training set and make money off of it while giving you zero credit and providing zero feedback.

But then we get to use those AI tools.

The refrain here comes down not to "AI" but mostly to "the AI bot assault" which is a different thing. Sure lets have an discussion about badly behaved and overzealous web scrapers. As for credit, I've asked AI for it's references and gotten them. If my information is merely mushed into AI training model I'm not sure why I need credit. If you discuss this thread with your friends are you going to give me credit?

No, you don't "get to" use the AI tools. You have to buy access to them (beyond some free trials).

Yes. I get to buy access to them. They're providing an expensive to provide service that requires specialized expertise. I don't see the problem with that.

"If you discuss this thread with your friends are you going to give me credit?"

Yes. How else would I enable my friends to look it up for themselves?

6 months from now when you've internalized this entire thread are you even going to remember where you got it from?

Why are you shifting the discussion by adding two new variables (time/memory)?

Because that's how one interacts with AI.

Yeah. Running out of arguments, are you?

[dead]

Ultimately, you have to realize that this is a losing battle, unless we have completely draconian control over every piece of silicon. Captchas are being defeated; at this point they're basically just mechanisms to prove you Really Want to Make That Request to the extent that you'll spend some compute time on it, which is starting to become a bit of a waste of electricity and carbon.

Talented people that want to scrape or bot things are going to find ways to make that look human. If that comes in the form of tricking a physical iPhone by automatically driving the screen physically, so be it; many such cases already!

The techniques you need for preventing DDoS don't need to really differentiate that much between bots and people unless you're being distinctly targeted; Fail2Ban-style IP bans are still quite effective, and basic WAF functionality does a lot.

Agreed, copyright issues need to be solved via legislation and network abuse issues need to be solved by network operators. Trying to run around either only makes the web worse for everyone.

Rate-limits? Use a CDN? Lots of traffic can be a problem whether it's bots or humans.

You realize this entire thread is about a pitch from a CDN company trying to solve an issue that has presented itself at such a scale that this is the best option they can think of to keep the web alive, right?

"Use a CDN" is not sufficient when these bots are so incredibly poorly behaved, because you're still paying for that CDN and this bad behavior is going to cost you a fortune in CDN costs (or cost the CDN a fortune instead, which is why Cloudflare is suggesting this).

Everyone can get it from the bots now?

Build better