Hacker News

I was really confused about the case folding, this page explained the motivation well https://jean.abou-samra.fr/blog/unicode-misconceptions

""" Continuing with the previous example of “ß”, one has lowercase("ss") != lowercase("ß") but uppercase("ss") == uppercase("ß"). Conversely, for legacy reasons (compatibility with encodings predating Unicode), there exists a Kelvin sign “K”, which is distinct from the Latin uppercase letter “K”, but also lowercases to the normal Latin lowercase letter “k”, so that uppercase("K") != uppercase("K") but lowercase("K") == lowercase("K").

The correct way is to use Unicode case folding, a form of normalization designed specifically for case-insensitive comparisons. Both casefold("ß") == casefold("ss") and casefold("K") == casefold("K") are true. Case folding usually yields the same result as lowercasing, but not always (e.g., “ß” lowercases to itself but case-folds to “ss”). """

One question I have is why have Kelvin sign that is distinct from Latin K and other indistinguishable symbols? To make quantified machine readable (oh, this is not a 100K license plate or money amount, but a temperature)? Or to make it easier for specialized software to display it in correct placed/units?

gb18030.ucm ibm-1364_P110-2007.ucm ibm-1390_P110-2003.ucm ibm-1399_P110-2003.ucm ibm-16684_P110-2003.ucm ibm-933_P110-1995.ucm ibm-949_P110-1999.ucm ibm-949_P11A-1999.ucm

bee_rider 4 months ago [ - ]

They seem to have (if I understand correctly) degree-Celsius and degree-Fahrenheit symbols. So maybe Kelvin is included for consistency, and it just happens to look identical to Latin K?

IMO the confusing bit is giving it a lower case. It is a symbol that happens to look like an upper case, not an actual letter…

bigwheels 4 months ago [ - ]

And why can't the symbol be a regular old uppercase "K"? Who is this helping?

infogulch 4 months ago [ - ]

Unicode wants to be able to preserve round-trip re-encoding from this other standard which has separate letter-K and degree-K characters. Making these small sacrifices for compatibility is how Unicode became the defacto world standard.

shiomiru 4 months ago [ - ]

The "other standard" in this case being IBM-944. (At least looking at https://www.unicode.org/versions/Unicode1.0.0/ch06.pdf p. 574 (=110 in the PDF) I only see a mapping from U+212A to that one.)

kahirsch 4 months ago [ - ]

The ICU mappings files have entries for U212A in the following files:

[flagged]

That "deeper explanation" seems incorrect, considering that the KSC column is empty in the mapping linked above.

I think just using uppercase Latin K is the recommendation.

But, I dunno. Why would anybody apply upper or lower case operators to a temperature measurement? It just seems like a nonsense thing to do.

zygentoma 4 months ago [ - ]

Maybe not for text to be read again, but might be sensible e.g. for slug or file name generation and the like...

bee_rider 3 months ago [ - ]

That’s an interesting thought.

IMO this is somewhere where if we were really doing something, we might as well go all the way and double check the relevant standards, right? The filesystem should accept some character set for use as names, and if we’re generating a name inside our program we should definitely find a character set that fits inside what the filesystem expects and that captures what we want to express… my gut says upper case Latin K would be the best pick if we needed to most portably represent Kelvin in a filename on a reasonably modern, consumer filesystem.

Eisenstein 4 months ago [ - ]

I wonder if you can register a domain with it in the name.

oneshtein 4 months ago [ - ]

A symbol may look differently than original letter, for example N - №, € - E (Є), S - $, integral, с - ©, TM - ™, a - @, and so on.

However, those symbols doesn't have lower case variants. Moreover, lower case k means kilo-, not a «smaller Kelvin».

Although it is a prefix in that case, so we should expect to see k alone.

To maximally confuse things, I suggest we start using little k alone to resolve another annoying unit issue: let’s call 1 kilocalorie “k.”

ahoka 4 months ago [ - ]

Probably useful in a non-latin codeset?

UltraSane 4 months ago [ - ]

having a dedicated Kelvin symbol preserves the semantics.

peterfirefly 4 months ago [ - ]

> One question I have is why have Kelvin sign that is distinct from Latin K and other indistinguishable symbols?

To allow round-tripping.

Unicode did not win by being better than all previously existing encodings, even though it clearly was.

It won by being able to coexist with all those other encodings for years (decades) while the world gradually transitioned. That required the ability to take text in any of those older encodings and transcode it to Unicode and back again without loss (or "gain"!).

bawolff 4 months ago [ - ]

Unicode has the goal of being a 1:1 mapping for all other character encodings. Usually weird things like this is so there can be a 1:1 reversible mapping to some ancient character encoding.