does anyone have any sources to read about ai-based compression?

I remember hearing a lot about "compression is a lot about prediction", but I don't remember reading any practical result

It can and has been done just not very practical. Having a dozen GB language model just to squeeze out few more percent on plaintext compression which already compresses well and is tiny in comparison of images or video is not worth it outside benchmarks. Even superior traditional conpression algorithms are often not used due to insufficient software support. Multigabyte decompressor as big as rest of your OS installation is not practical to distribute or standardize. It would also take a lot of memory at runtime for decompressing thus shadowing the efficiency gains in everyday use. Only if you have huge archival scale of data it might be worth the gains. But for long term archival fragile formats which depend on huge arbitrary extra knowledge isnt a good idea. I am not quite sure if ai based compression would make it more robust by allowing to fix corruption based on context or make it worse by having single bitflip produce completely opposite but still plausible looking text. At least with traditional compression its usually obvious when corruption causes gibberish. And then you have problem of versioning, you need to have exactly the same version of dozen GB model for decompression as was used for compression. Just one of them is questionable now imagine having to store few dozen of them. Most computers have code for supporting at least half a dozen compression formats, and many of those are parametrized allowing single algorithm to handle multiple varations of the compression scheme, and then many apps bundle their own copies of compression library.

I mostly agree, however:

> But for long term archival fragile formats which depend on huge arbitrary extra knowledge isnt a good idea.

This doesn't need to be a problem: you can and should layer an error correcting code on top.

compression = prediction + entropy coding was already an insight from Claude Shannon in the 1950s

Since LLM are inherently token predictors, that makes using them for losless compression almost trivial. For something close to the state of the art see e.g. Fabrice Bellard (of course) ts_zip: https://bellard.org/ts_zip/

I think some of the confusion comes from the fact that there is a pretty big difference between the techniques employed by compressors that optimize compression ratio at the cost of nearly everything else, like ts_zip above, and practical tools that intend to balance compression ratio with limitation on CPU speed / memory, like zstd.

When optimizing for compression ratio, the prediction + entropy coding paradigm dominates. Practical tools, even modern ones like zstd, are mostly based around sliding window compression à la LZ77 (unzip/deflate), with the main selling point of more modern tools being that they scale up to larger window sizes and run really really fast. Some of these (like LZO) don't even have an entropy coding step to save time. zstd has both Huffman coding and FSE: Huffman coding is suboptimal but presumably it's an option because it's faster, and on lower compression levels it's preferable to be fast.

Anyway, the bottom line is: don't get confused between the state of the art in terms of compression ratio, and practical tools. Those are quite different things.