Hacker News

Lies in user agent strings where for bypassing bugs, poor workarounds and assumptions that became wrong, they are nothing like what we are talking about.

gkbrk 2 days ago [ - ]

A server returning HTML for Chrome but not cURL seems like a bug, no?

This is why there are so many libraries to make requests that look like they came from browser, to work around buggy servers or server operators with wrong assumptions.

grayhatter a day ago [ - ]

> A server returning HTML for Chrome but not cURL seems like a bug, no?

tell me you've never heard of https://wttr.in/ without telling me. :P

It would absolutely be a bug iff this site returned html to curl.

> This is why there are so many libraries to make requests that look like they came from browser, to work around buggy servers or server operators with wrong assumptions.

This is a shallow take, the best counter example is how googlebot has no problem identifying it itself both in and out of thue user agent. Do note user agent packing, is distinctly different from a fake user agent selected randomly from the list of most common.

The existence of many libraries with the intent to help conceal the truth about a request doesn't feel like proof that's what everyone should be doing. It feels more like proof that most people only want to serve traffic to browsers and real users. And it's the bots and scripts that are the fuckups.

batch12 a day ago [ - ]

Googlebot has no problem identifying itself because Google knows that you want it to index your site if you want visitors. It doesn't identify itself to give you the option to block it. It identifies itself so you don't.

grayhatter a day ago [ - ]

I care much less about being indexed by Google as much as you might think.

Google bot doesn't get blocked from my server primarily because it's a *very* well behaved bot. It sends a lot of requests, but it's very kind, and has never acted in a way that could overload my server. It respects robots.txt, and identifies itself multiple times.

Google bot doesn't get blocked, because it's a well behaved bot that eagerly follows the rules. I wouldn't underestimate how far that goes towards the reason it doesn't get blocked. Much more than the power gained by being google search.

batch12 a day ago [ - ]

Yes, the client wanted the server to deliver content it had intended for a different client, regardless of what the service operator wanted, so it lied using its user agent. Exact same thing we are talking about. The difference is that people don't want companies to profit off of their content. That's fair. In this case, they should maybe consider some form of real authentication, or if the bot is abusive, some kind of rate limiting control.

jraph a day ago [ - ]

Add "assumptions that became wrong" to "intended" and the perspective radically changes, to the point that omitting this part from my comment changes everything.

I would even add:

> the client wanted the server to deliver content it had intended for a different client

In most cases, the webmaster intended their work to look good, not really to send different content to different clients. That later part is a technical means, a workaround. The intent of bringing the ok version to the end user was respected… even better with the user agent lies!

> The difference is that people don't want companies to profit off of their content.

Indeed¹, and also they don't want terrible bot to bring down their servers.

1: well, my open source work explicitly allows people to profit off of it - as long as the license is respected (attribution, copyleft, etc)

grayhatter a day ago [ - ]

> Yes, the client wanted the server to deliver content it had intended for a different client, regardless of what the service operator wanted, so it lied using its user agent.

I would actually argue, it's not nearly the same type of misconfiguration. The reason scripts, which have never been a browser, who omit their real identity, are doing it, is to evade bot detection. The reason browsers pack their UA with so much legacy data, is because of misconfigured servers. The server owner wants to send data to users and their browsers, but through incompetence, they've made a mistake. Browsers adapted by including extra strings in the UA to account for the expectations of incorrectly configured servers. Extra strings being the critical part, Google bot's UA is an example of this being done correctly.