Hacker News

Ok.

Now try to separate the "learning the language" from "learning the data".

If we have a model pre trained on language does it then learn concepts quicker, the same or different?

Can we compress just data in a lossy into an LLM like kernel which regenerates the input to a given level of fidelity?