I think this is an analogy that's been taken far too far. The output of intelligence just isn't compression, that's memorization. The role of intelligence is to generate novelty.
It's true that LLMs do something that looks very compression like in their weights, but it is lossy, and it has to be--if you're not lossy, you've overfitted the corpus, and that's bad. Post-training takes this even further, because you're not doing anything that looks like training on a specific corpus, you're exploring in a wider space of text. That text doesn't even concretely exist until you start exploring it.
I'm sure there must be a serious attempt to pursue this analogy that isn't just handwaving, but I haven't seen it.
LLM compression doesn't necessarily have to be lossy.
You can use the fact that LLMs predict P(next token | existing tokens) to losslessly and efficiently compress arbitrary token sequences. This idea is closely related to arithmetic coding.
True, but it's not relevant because that isn't how we actually train LLMs for use as quasi-intelligent tools. We specifically do not want the model to be able to just memorize its input, which is what your process requires.
Many things about the process are similar, so there's some analogy, but it just isn't the same.