> I don't want to defend them, because they gate away a good chunk of the internet with their "bot protection"

They also gate away a good many people with their "bot protection". I am extremely worried about how so many seem to have outsourced the control over who can access their websites to a company, with no second thoughts whatsoever.

The problem is what is the alternative? I'm (not) defending them or this practice by any measure, but we all know what happens if you just open your site up without these, especially with AI bots which hammer servers and are in effect a legalized DDoS system. I've hated CAPTCHAs ever since I first encountered them and I can't wait for them to just finally die a permanent death, but I also don't know how we solve the "how do you identify a human and a bot" in a way which doesn't require server admins to have extremely beefy servers or similar setups to handle the extra load. I'm not going to do the "there HAS to be a way thing" either because, for all I know, this could just be one of those impossible-to-solve problems.

> we all know what happens if you just open your site up without these, especially with AI bots which hammer servers and are in effect a legalized DDoS system

No, we don't know. I honestly do not understand the problem. I run websites, both static and non-static. Granted, my sites aren't exactly the most popular internet go-to destinations, but I should be seeing this DDoS too, right?

I do see lots of requests. Nothing that any modern system can't handle. Computers are stupid fast these days. Unless you are doing something unreasonable, it's really hard to even notice this "extra load".

I understand there are sites for whom this causes problems, but I think these are rare and could be optimized not to do unreasonable things.

I think too many people are annoyed by AI companies (arguably understandable position), look at their logs and speak of "hammering", "DDoS" and "extra load", while in reality it doesn't matter much.

A small, single EU country focused non-static e-commerce, with proper robots.txt instructions that worked perfectly well in the search & co bots -only "era" with rate limiting for nginx/php-fpm setup - is kinda struggling without CF to handle 15000 requests per 15 minutes, coming from Chrome "users" from IPv6. Best so far was an avg. server load in htop = 40 on an 8-core server x_x

Has anyone pointed an AI scraper at your server at all? Unless your website appears in search engine listings I don't think the AI scrapers will slam it. My server has never been hit by them but my server is also practically unknown. All of this said, I'm not going to claim that server loads can handle it because many sysadmins have claimed otherwise, and I would like to think that their claims are reliable.

As soon as you get your TLS certificate you get bombarded with scraping. You don't need someone to "point a scraper at you".

What matters most is usually how much there is to scrape. If you have like 5 pages that's nothing. For forum like websites where each thread, each user profile, etc. gets scraped that's when traffic increases. I just let them have at it with no issues though, computers are fast.

Also, how do we even know they're really "AI scrapers", or just a deliberate DDoS to push sites into using CF or other "anti-bot" providers?

You get downvoted for these opinions but I agree. Most people that complain that their servers get hammered by AI bots are those that run very unoptimized servers that can only handle like 100 rps. I've never had any issues with any of my moderately optimized websites. A $10 VPS can handle sooo much traffic.

I don't think it's just privacy, it also increasingly turns the web itself into a walled garden. The end result is that websites can only ever be accessed by approved clients - the latest Chrome, Edge and Firefox if you're lucky - and nothing else.

I can no longer access any website that's "protected" by Cloudflare. As soon a website enables that stuff… "Shoot, another one bites the dust." I wonder if the website owners realise at all how many actual users they lose by this sort of "protection."

Cloudflare will just tell them that 70% traffic drop is because 70% of their traffic was bots, and everything is working fine, and hey, don't you want to upgrade to a paid plan to block 50% of the remainder? Think about how many bots will be blocked with that upgrade!

>I wonder if the website owners realise at all how many actual users they lose by this sort of "protection."

How many people do you think are browsing with a weird enough config (eg. custom browser like OP, or some weird config like firefox with fingerprinting protection on a raspeberry pi) to trip cloudflare's protection?

I got locked out of some websites by Cloudflare Turnstile on some very standard configurations, like an iPhone on Safari, or a Windows 11 desktop with Firefox or Edge, neither with a VPN on. I never found out why.

Well… I know plenty people in my circle affected by this. Just have a slightly outdated system you simply can't afford to update: it's way to easy to get cut off like this. IMHO, a rather systematic discrimination of poorer people.

There are dozens of us :)

In my experience what really makes it loop every single time though is JShelter. CF doesn't like having your fingerprintable data bits messed with.

There are legitimate uses for non-instrusive, ethical and legal scraping, but some of us have had to resort to extreme measures:

https://roundproxies.com/blog/bypass-bot-detection/

I'm one of those who have enabled cloudflare on all of the sites I maintain. Additionally, Added turnstile on every form.

I know some actual users get blocked. But the amount of spam we get without it, the amount of bot traffic simply overwhelming the server... It is just too much.

Recently I also hard blocked all IPs from china Singapore India Pakistan Russia and whole of africa. Do I want to do it? No. But the amount of bot traffic and corresponding spam is a bigger problem :(

They sometimes have to comply with legal requests (which I understand), but at the same time they have a huge market share - which means that the internet is becoming less and less decentralized and more in their control. We've seen the effects of that in previous outages...

>I am extremely worried about how so many seem to have outsourced the control over who can access their websites to a company, with no second thoughts whatsoever.

I think the Web is on its last legs, anyway. Generative AI and LLM-instead-of-search has destroyed what little value remained.

It's just one more facet of the enshittoscene, the era where actual product quality is completely irrelevant. Put it in the same bucket as websites that lag when you scroll, apps that refuse to show you video without a huge play/pause button overlaid in the middle of it that never goes away, and the movie Melania. My hypothesis is that billion-dollar businesses no longer exist to sell things to customers, but only to impress other billionaires to get their investment money.