For fun over the last few days, I've built a compressor / decompressor that uses the logits from an LLM, for each token in the input, then takes the ranks and exponential goolomb encodes them. Then you work in reverse to regenerate the original

It took me ages to get the prediction for the second token after "hello" to match the same as the prediction for the second token when running the model on the string "hello world", despite the fact that I was using a causal model. I tried all kinds of things before discovering that `quantized: false` was the important setting.

What's the Weissman score? Or more seriously :) did it perform well. Sounds like it should. If more and more text is AI slop it should do well.

I dont fully understand what you said but I guess higher probability logits are encoded with fewer bits. If your text is the LLM output then you may need a bit or two per token?

I used exponential golomb coding, so the rank 0 logit is encoded with a single bit, ranks 1 and 2 are encoded with three bits, ranks 3-6 are encoded with 5 bits, etc.

In terms of performance, I've not done any serious testing, but e.g. the wikipedia article on volcanos compresses to about 20% using GPT2. I've seen other strings compress even further.

The big issue is that while encoding is not unreasonable, decoding any significant amount of data is incredibly slow, since I'm doing a model run for every token in the output. It's bad enough that the scheme is probably unworkable as it is. I'm thinking about changing my code so that it streams out the tokens as it decodes them, so you're not just left there waiting for ages.

I don't know about golomb coding, but with Arithmetic coding you can do stream decoding(AC), if I remember correctly.

I supervised a student's project whose goal was exactly that : implement compression with LLMs using AC.

Since AC is optimal, if your LLM has an average cross entropy x on some dataset, you can expect that the compression will compress data using x nats per token on average!

Arithmetic coding looks like an extremely interesting approach, given that you can use the model at each step to give you the probabilities of each token.