Hacker News

how do you decompress all those 4 words from one token?

Not from one token, from one embedding. Text contains a low amount of information: it is possible to compress a few token embeddings into a single tiken embedding.

The how is variable. The calm paper seems to have used a MLP to compress from and ND input (N embeddings of size D) into a single D embedding and other for decompress them back

HarHarVeryFunny 5 months ago [ - ]

The mechanism would be prediction (learnt during training), not decompression.

It's the same as LLMs being able to "decode" Base64, or work with sub-word tokens for that matter, it just learns to predict that:

<compressed representation> will be followed by (or preceded by) <decompressed representation>, or vice versa.