Ok.
Now try to separate the "learning the language" from "learning the data".
If we have a model pre trained on language does it then learn concepts quicker, the same or different?
Can we compress just data in a lossy into an LLM like kernel which regenerates the input to a given level of fidelity?