From https://swipe.futo.tech/:
The ContextLM model is a very small language model that is trained for a single language. It's used to improve the quality of predictions by eliminating nonsensical words given the preceding words in the sentence. It only requires text data for training.
So it would need one model per language? Not impossible (for me)...