I came across this a week ago when I was looking at some LLM generated code for a ToUpper() function. At some point I “knew” this relationship, but I didn’t really “grok” it until I read a function that converted lowercase ascii to uppercase by using a bitwise XOR with 0x20.
It makes sense, but it didn’t really hit me until recently. Now, I’m wondering what other hidden cleverness is there that used to be common knowledge, but is now lost in the abstractions.
A similar bit-flipping trick was used to swap between numeric row + symbol keys on the keyboard, and the shifted symbols on the same keys. These bit-flips made it easier to construct the circuits for keyboards that output ASCII.
I believe the layout of the shifted symbols on the numeric row were based on an early IBM Selectric typewriter for the US market. Then IBM went and changed it, and the latter is the origin of the ANSI keyboard layout we have now.
xor should toggle?
That's correct, a toUpper would just use OR.
I left out that the line before there was a check to make sure the input byte was between ‘a’ and ‘z’. This ensures that if the char is already upper case, you don’t do an extraneous OR. And at this point, OR, XOR, or even a subtract 0x20 would work. For some reason the LLM thought the XOR was faster.
I honestly wouldn’t have thought anything of it if I hadn’t seen it written as `b ^ 0x20`.