I need to call out a myth about UTF-8. Tools built to assume UTF-8 are not backwards compatible with ASCII. An encoding INCLUDES but also EXCLUDES. When a tool is set to use UTF-8, it will process an ASCII stream, but it will not filter out non-ASCII.

I still use some tools that assume ASCII input. For many years now, Linux tools have been removing the ability to specify default ASCII, leaving UTF-8 as the only relevant choice. This has caused me extra work, because if the data processing chain goes through these tools, I have to manually inspect the data for non-ASCII noise that has been introduced. I mostly use those older tools on Windows now, because most Windows tools still allow you to set default ASCII.

Do you have an actual example where this causes an issue? "ASCII" tools mostly just passed along non-ASCII bytes unchanged even before UTF-8.

The usual statement isn't that UTF-8 is backwards compatible with ASCII (it's obvious that any 8-bit encoding wouldn't be; that's why we have UTF-7!). It's that UTF-8 is backwards compatible with tools that are 8-bit clean.

Yes, the myth I was pointing out is based on loose terminology. It needs to be made clear that "backwards compatible" means that UTF-8 based tools can receive but are not constrained to emit valid ASCII. I see a lot of comments implying that UTF-8 can interact with an ASCII ecosystem without causing problems. Even worse, it seems most Linux developers believe there is no longer a need to provide a default ASCII setting if they have UTF-8.

That's not a myth about UTF-8. That's a decision by tools not to support pure ASCII.