Everything is a tradeoff, and different use cases require different tradeoffs:

Option A: this model

Option B: faster model, only 1 language

Option C: same size model, only 1 language but higher quality

My point is that option A isn’t always best.

And on the borrowed words bit, there’s no rule that we cannot add borrowed words into the vocab. But you don’t need the whole language. I know what deja voux means but I don’t speak French.

that depends entirely on how common the borrowed thing is. And anyway, option A is always going to be insufficient for my code-switching example -- as another commenter pointed out, it is very common to want to refer to a foreign work (song, movie, book) by its foreign language title. Monolingual ASR solutions break over this all the time. Try asking Alexa to play a Spanish language track on Spotify. It fails frequently.

The real world is like that.