Your whole comment assumes language identification is both trivial and fail-safe. It is neither and it can get worse if you consider e.g. cases where the page has different elements in different languages, different languages that are similar.

Even if language identification was very simple, you're still putting the burden on the user's tools to identify something the writer already knew.

Language detection (where “language”== one of the 200 languages that are actually used), IS trivial, given a paragraph of text.

And the fact is that the author of the web page doesn’t know the language of the content, if there’s anything user contributed. Should you have to label every comment on HN as “English”? That’s a huge burden on literally every internet user. Other written language has never specified its language. Imposing data-entry requirements on humans to satisfy a computer is never the ideal solution.

> 200 languages that are actually used

Do you have any reference of that or are you implying we shouldn't support the other thousands[0] of languages in use just because they don't have a big enough user base?

> And the fact is that the author of the web page doesn’t know the language of the content, if there’s anything user contributed. Should you have to label every comment on HN as “English”? That’s a huge burden on literally every internet user.

In the case of Hacker News or other pages with user submitted and multi-language content, you can just mark the comments' lang attribute to the empty string, which means unknown and falls back to detection. Alternatively, it's possible to let the user select the language (defaulting to their last used or an auto-detected one), Mastodon and BlueSky do that. For single language forums and sites with no user-generated content, it's fine to leave everything as the site language.

> Other written language has never specified its language. Imposing data-entry requirements on humans to satisfy a computer is never the ideal solution.

There's also no "screen reader" nor "auto translation" in other written language. Setting the content language helps to improve accessibility features that do not exist without computers.

[0] https://www.ethnologue.com/insights/how-many-languages/

I wish this comment was true, but due to a foolish attempt to squish all human charactets to 2 bytes as UCS (that failed and turned into the ugly UTF-16 mess), a disaster called Han Unification was unleashed upon the world, and now out-of-band communication is required to render the correct Han characters in a page and not offend people.