I agree with your first point. I've seen this same issue crop up in several other ORMs.

As to your second point. VARCHAR uses N + 2 bytes where as NVARCHAR uses N*2 + 2 bytes for storage (at least on SQL Server). The vast majority of character fields in databases I've worked with do not need to store unicode values.

> The vast majority of character fields in databases I've worked with do not need to store unicode values.

This has not been my experience at all. Exactly the opposite, in fact. ASCII is dead.

Vast majority of text fields I see are coded values that are perfectly fine using ascii, but I deal mostly with English language systems.

Text fields that users can type into directly especially multiline tend to need unicode but they are far fewer.

Some examples of coded fields that may be known to be ascii: order name, department code, business title, cost center, location id, preferred language, account type…

English has plenty of Unicode — claiming otherwise is such a cliché…

Unicode is a requirement everywhere human language is used, from Earth to the Boöotes Void.

Just to be pedantic, those characters are in 'ANSI'/CP1252 and would be fine in a varchar on many systems.

Not that I disagree — Win32/C#/Java/etc have 16-bit characters, your entire system is already 'paying the price', so weird to get frugal here.

> Unicode is a requirement everywhere human language is used

Strange then how it was not a requirement for many, many years.

Also less awkward to make it right the first time, instead of explaining why someone can’t type their name or an emoji

Specifically not talking about a name field

I am talking about coded values, like Status = 'A', 'B' or 'C'

Taking double the space for this stuff is a waste of resources and nobody usually cares about extended characters here in English language systems at least they just want something more readable than integers when querying and debugging the data. End users will see longer descriptions joined from code tables or from app caches which can have unicode.

It's way better to just use a DBMS that supports enums. I know SQL server isn't one of those but I still don't store my coded values as strings.

[deleted]

Those are all single byte characters in UTF-8.

Generally if it stores user input it needs to support Unicode. That said UTF-8 is probably a way better choice than UTF-16/UCS-2

The one place UTF-16 massively wins is text that would be two bytes as UTF-16, but three bytes as UTF-8. That's mainly Chinese, Japanese, Korean, etc...

UTF-8 is a relatively new thing in MSSQL and had lots of issues initially, I agree it's better and should have been implemented in the product long ago.

I have avoided it and have not followed if the issues are fully resolved, I would hope they are.

> UTF-8 is a relatively new thing in MSSQL and had lots of issues initially, I agree it's better and should have been implemented in the product long ago.

Their insistence on making the rest of the world go along with their obsolete pet scheme would be annoying if I ever had to use their stuff for anything ever. UTF-8 was conceived in 1992, and here we are in 2026 with a reasonably popularly database still considering it the new thing.

I would be more critical of Microsoft choosing to support UCS-2/UTF-16 if Microsoft hadn't completed their implementation of Unicode support in the 90s and then been pretty consistent with it.

Meanwhile Linux had a years long blowout in the early 2000s over switching to UTF-8 from Latin-1. And you can still encounter Linux programs that choke on UTF-8 text files or multi-byte characters 30 years later (`tr` being the one I can think of offhand). AFAIK, a shebang is still incompatible with a UTF-8 byte order mark. Yes, the UTF-8 BOM is both optional and unnecessary, but it's also explicitly allowed by the spec.

In 92 it was a conference talk. In 98 it was adopted by the IETF. Point probably stands though.

the data types were introduced with SQL Server 7 (1998) so i’m not sure it’s accurate to state that it’s considered as the new thing.

To complicate matters SQL Server can do Nvarchar compression, but they should have just done UTF-8 long ago:

https://learn.microsoft.com/en-us/sql/relational-databases/d...

Also UTF-8 is actually just a varchar collation so you don't use nvarchar with that, lol?