Hacker News

UTF-8 does not need a BOM at all and never needed it, for two reasons:

- first, byte order doesn't affect the UTF-8 encoding,

- second, the codeset metadata problem you're trying to solve is a problem that already existed before and still does after UTF-8 enters the scene -- you just have to know if some text file (or whatever) uses UTF-8, ISO 8859-x, SHIFT-JIS, UTF-16, etc.

The second point addresses your concern, but that metadata has to be out of band. Putting it in-band creates the sorts of problems that others have pointed out, and it creates an annoyance once all non-Unicode locales are gone. And since the goal is to have Unicode replace all other codesets, and since we've made a great deal of progress in that direction, there is no need now to add this wart.

mikelabatt a day ago [ - ]

Thanks for your insights. I did change my mind about the need for a BOM (though not about the need to be able to parse/skip it if present).

In a future where everything defaults to UTF-8 it makes sense. This is probably easier to envision in an English-only context where the jump from 7-bit ASCII to UTF-8 is cleaner.

Where I come from, UTF-8 is not always supported. Without a header (or "BOM", though we don't like the name) you don't know in what encoding a text file was meant to be (re-)saved as when it was created. My example of an empty file was meant to illustrate that. But leaning on the Utopian side, I too shall put more energy towards all apps supporting UTF-8 :)

cryptonector a day ago [ - ]

Excellent!

Yeah, UTF-8 by default -or better, as the only option- is the dream.

Keep in mind that if you do use a BOM for UTF-16 then it's possible to reliably tell that some file is in UTF-8.