Hacker News

Bellard has a very interesting project that is `ts_zip`, a compression algorithm powered by LLMs. It's just an "experiment" and should never be used in production, but very smart.

The description on his website is amusing: "The ts_zip utility can compress (and hopefully decompress) text files using a Large Language Model"

https://bellard.org/ts_zip/

hbn 8 hours ago [ - ]

> (and hopefully decompress)

If the decompression is optional, I've got a really impressive compression algorithm in mind!

notpachet 8 hours ago [ - ]

That's my favorite algorithm of all time

zeroq 9 hours ago [ - ]

But that's exactly what LLMs are. :)

My mental model and go to ELI5 is "imagine you compressed the whole internet into a zip-like archive and you have an extremely clever and efficient way to search it for data".

I'm old enough to remember the time when you could order wikipedia on CDs and I don't see much difference between that and downloading LLM.

santiagobasulto an hour ago [ - ]

That is true, but I have to be honest and say that I didn’t make the connection until I saw Bellard’s project for the first time, and I said: “ah! That actually makes A LOT of sense”

AceJohnny2 7 hours ago [ - ]

There is a field of competitive compression algorithms, where time and computation are not factors. People have made compressors that take hours (days?) to compress the test corpus.

A long-running kinda-joke in the field is that the upper-bound of compression is "AI-complete", where instead of compressing, say, the text data of the complete works of Shakespeare, the compressor just encodes "The Complete Works of Shakespeare", and the AI decompressor re-generates the output from that prompt.

With the advent of LLMs, Bellard just made that joke a reality.