> A server returning HTML for Chrome but not cURL seems like a bug, no?
tell me you've never heard of https://wttr.in/ without telling me. :P
It would absolutely be a bug iff this site returned html to curl.
> This is why there are so many libraries to make requests that look like they came from browser, to work around buggy servers or server operators with wrong assumptions.
This is a shallow take, the best counter example is how googlebot has no problem identifying it itself both in and out of thue user agent. Do note user agent packing, is distinctly different from a fake user agent selected randomly from the list of most common.
The existence of many libraries with the intent to help conceal the truth about a request doesn't feel like proof that's what everyone should be doing. It feels more like proof that most people only want to serve traffic to browsers and real users. And it's the bots and scripts that are the fuckups.
Googlebot has no problem identifying itself because Google knows that you want it to index your site if you want visitors. It doesn't identify itself to give you the option to block it. It identifies itself so you don't.
I care much less about being indexed by Google as much as you might think.
Google bot doesn't get blocked from my server primarily because it's a *very* well behaved bot. It sends a lot of requests, but it's very kind, and has never acted in a way that could overload my server. It respects robots.txt, and identifies itself multiple times.
Google bot doesn't get blocked, because it's a well behaved bot that eagerly follows the rules. I wouldn't underestimate how far that goes towards the reason it doesn't get blocked. Much more than the power gained by being google search.