Hacker News

The mechanism would be prediction (learnt during training), not decompression.

It's the same as LLMs being able to "decode" Base64, or work with sub-word tokens for that matter, it just learns to predict that:

<compressed representation> will be followed by (or preceded by) <decompressed representation>, or vice versa.