Hacker News

> The UTF-8 standard was adopted really in 1998-ish and the standard was already variable using 1 to 4 bytes.

No, it was 1 to 6 bytes until RFC 3629 (Nov 2003). AFAIK development of MySQL 4.1 began prior to that, despite the release not happening until afterwards.

Again, they absolutely should have addressed it sooner. But people make mistakes, especially as we're talking about a venture-funded startup in the years right after the dot-com crash.

> It literally does not take any more storage because it is a variable width encoding.

I already addressed that in my previous comment: in old versions of MySQL, a number of critical code paths required allocating worst-case buffer sizes, or accounting for worst-case value lengths in indexes, etc. So if a charset allows 6 bytes per character, that means multiplying max length by 6, in order to handle the pathological case.