Hacker News

The CALM paper https://shaochenze.github.io/blog/2025/CALM/ says it is possible to compress 4 tokens in a single embedding, so... image = 4×256=1024 words > 1000 words. QED

bikeshaving 5 months ago [ - ]

2.4% relative error is not bad.

pastor_williams 5 months ago [ - ]

Reminds me of Babbage making allowance for meter.

"""

    ... it is said that he [Babbage] sent the following letter to Alfred, Lord Tennyson about a couplet in "The Vision of Sin":

         Every minute dies a man,
         Every minute one is born

    I need hardly point out to you that this calculation would tend to keep the sum total of the world's population in a state of perpetual equipoise, whereas it is a well-known fact that the said sum total is constantly on the increase. I would therefore take the liberty of suggesting that in the next edition of your excellent poem the erroneous calculation to which I refer should be corrected as follows:

         Every minute dies a man,
         And one and a sixteenth is born

    I may add that the exact figures are 1.167, but something must, of course, be conceded to the laws of metre.

"""

    Charles Babbage and his Calculating Engines

cbhl 5 months ago [ - ]

Shouldn't it be the other way around if the population is increasing? Every minute one is born = 1440 born/day, every minute and a sixteenth ~= 1335 dead/day for a net population increase of 105/day.

BrenBarn 5 months ago [ - ]

It means that in every minute, one and a sixteenth of a man is born.

zahlman 5 months ago [ - ]

Wouldn't "one and a sixth" be more accurate in both respects?

behnamoh 5 months ago [ - ]

how do you decompress all those 4 words from one token?

estebarb 5 months ago [ - ]

Not from one token, from one embedding. Text contains a low amount of information: it is possible to compress a few token embeddings into a single tiken embedding.

The how is variable. The calm paper seems to have used a MLP to compress from and ND input (N embeddings of size D) into a single D embedding and other for decompress them back

HarHarVeryFunny 5 months ago [ - ]

The mechanism would be prediction (learnt during training), not decompression.

It's the same as LLMs being able to "decode" Base64, or work with sub-word tokens for that matter, it just learns to predict that:

<compressed representation> will be followed by (or preceded by) <decompressed representation>, or vice versa.