Hacker News

Imho ascii wasted over 20 of its precious 128 values on control characters nobody ever needs (except perhaps the first few years of its lifetime) and could easily have had degree symbol, pilcrow sign, paragraph symbol, forward tick and other useful symbols instead :)

ogurechny 9 hours ago [ - ]

Smaller, 6-bit code pages existed before and after that. They did not even have space for upper and lower case letters, but had control characters. Those codes directly moved the paper, switched to next punch card or cut the punched tape on the receiving end, so you would want them if you ever had to send more than a single line of text (or a block of data), which most users did.

Even smaller 5-bit Baudot code had already had special characters to shift between two sets and discard the previous character. Murray code, used for typewriter-based devices, introduced CR and LF, so they were quite frequently needed in way more than few years.

gpvos 9 hours ago [ - ]

Maybe 32 was a bit much, but even fitting a useful set of control characters into, say, 16, would be tricky for me. For example, ^S and ^Q are still useful when text is scrolling by too fast.

bee_rider 10 hours ago [ - ]

On top of the control symbols being useful, providing those symbols would have reduced the motivation for Unicode, right?

ASCII did us all the favor of hitting a good stopping point and leaving the “infinity” solution to the future.

zygentoma 11 hours ago [ - ]

I started using the separator symbols (file, group, record, unit separator, ascii 60-63 ... though mostly the last two) for CSV like data to store in a database. Not looking back!

gschizas 11 hours ago [ - ]

ASCII 60-63 is just <=>?

You probably mean 28-31 (∟↔▲▼, or ␜␝␞␟)

Unless this is octal notation? But 0o60-0o63 in octal is 0123

mmooss 8 hours ago [ - ]

I've wanted to do that but don't you have compatibility problems? What can read/import files with those deliminters? Don't people you are working with have problems?

y42 11 hours ago [ - ]

only that would have broken the whole thing back in the days ;)

mmooss 7 hours ago [ - ]

It is interesting that, as a guess, we waste an average of ~5% of storage capacity for text (12.5% of Unicode's first byte, but many languages regularly use higher bytes of course).

I don't fault the creators of ASCII - those control characters were probably needed at the time. The fault is ours for not moving on from the legacy technology. I think some non-ASCII/Unicode encodings did reuse the control character bytes. Why didn't Unicode implement that? I assume they were trying to be be compatible with some existing encodings, but couldn't they have chosen the encodings that made use of the control character code points?

If Unicode were to change it now (probably not happening, but imagine ...), what would they do with those 32 code points? We couldn't move other common characters over to them - those already have well-known, heavily used code points in Unicode and also iirc Unicode promises backward compability with prior versions.

There still are scripts and glyphs not in Unicode, but those are mostly quite rare and effectively would continue to waste the space. Is there some set of characters that would be used and be a good fit? Duplicate the most commonly used codepoints above 8 bits, as a form of compression? Duplicate combining characters? Have a contest? Make it a private area - I imagine we could do that anyway, because I doubt most systems interpret those bytes now.

Also, how much old data, which legitimately uses the ASCII control characters, would become unreadable?