Impressive. It's worth reading despite the slight AI sheen to the writing, as it's unusually informative relative to most security articles. The primary takeaway from my POV is to watch out for "helpful" string normalization calls in security sensitive software. Strings should be bags of bytes as much as possible. A lot of the exploits boil down to trying to treat security identifiers as text instead of fixed numeric sequences. Also, even things that look trivial like file paths in error messages can be deadly.
My take on the normalization is that it happens in the wrong place - you should not do it adhoc.
If your input from user is a string, define a newtype like UserName and do all validation and normalization once to convert it. All subsequent code should be using that type and not raw strings, so it will be consistent everywhere.
Its ridiculous that we haven't been aggressively boxing login credentials for decades at this point. This kind of issue was well discussed when I did my degree well over a decade ago.
It’s the same discussion as “don’t use floating point for money” and yet I’ve seen it done at every startup I’ve joined with all the same mistakes.
And if it turns out to be wrong for whatever reason, you can be confident your fixes will propagate anywhere the types are defined. If the situation is extremely bad, especially the sort of thing where all the users must still do something that you can't offload entirely on to the type (such as an entirely new set of methods and flow for correct usage), you can define a brand new type and the compiler will guide you as to how to force the entire system to be fixed as you push the new type in and remove the old one.
I've done all these things in fairly high-security contexts where I had a very critical username normalization step. It's a very valuable tool.
Yeah, I tolerated the AI tint in this article only because it was very informative otherwise.