Bellard has a very interesting project that is `ts_zip`, a compression algorithm powered by LLMs. It's just an "experiment" and should never be used in production, but very smart.

The description on his website is amusing: "The ts_zip utility can compress (and hopefully decompress) text files using a Large Language Model"

https://bellard.org/ts_zip/

> (and hopefully decompress)

If the decompression is optional, I've got a really impressive compression algorithm in mind!

That's my favorite algorithm of all time

But that's exactly what LLMs are. :)

My mental model and go to ELI5 is "imagine you compressed the whole internet into a zip-like archive and you have an extremely clever and efficient way to search it for data".

I'm old enough to remember the time when you could order wikipedia on CDs and I don't see much difference between that and downloading LLM.

That is true, but I have to be honest and say that I didn’t make the connection until I saw Bellard’s project for the first time, and I said: “ah! That actually makes A LOT of sense”

There is a field of competitive compression algorithms, where time and computation are not factors. People have made compressors that take hours (days?) to compress the test corpus.

A long-running kinda-joke in the field is that the upper-bound of compression is "AI-complete", where instead of compressing, say, the text data of the complete works of Shakespeare, the compressor just encodes "The Complete Works of Shakespeare", and the AI decompressor re-generates the output from that prompt.

With the advent of LLMs, Bellard just made that joke a reality.