> For example, how do you handle UTF-8 encoded surrogate pairs?

Surrogate pairs aren’t applicable to UTF-8. That part of Unicode block is just invalid for UTF-8 and should be treated as such (parsing error or as invalid characters etc).

In theory, yes. In practice, there are throngs of parsers and converters who might handle such cases differently. https://seriot.ch/projects/parsing_json.html

I mean hopefully not, but the linked example is about JSON parsing, not UTF-8.

A big chunk of bugs there are Unicode related, that is my point. When people parse JSON they don't think that they also parse Unicode.