compression = prediction + entropy coding was already an insight from Claude Shannon in the 1950s
Since LLM are inherently token predictors, that makes using them for losless compression almost trivial. For something close to the state of the art see e.g. Fabrice Bellard (of course) ts_zip: https://bellard.org/ts_zip/
I think some of the confusion comes from the fact that there is a pretty big difference between the techniques employed by compressors that optimize compression ratio at the cost of nearly everything else, like ts_zip above, and practical tools that intend to balance compression ratio with limitation on CPU speed / memory, like zstd.
When optimizing for compression ratio, the prediction + entropy coding paradigm dominates. Practical tools, even modern ones like zstd, are mostly based around sliding window compression à la LZ77 (unzip/deflate), with the main selling point of more modern tools being that they scale up to larger window sizes and run really really fast. Some of these (like LZO) don't even have an entropy coding step to save time. zstd has both Huffman coding and FSE: Huffman coding is suboptimal but presumably it's an option because it's faster, and on lower compression levels it's preferable to be fast.
Anyway, the bottom line is: don't get confused between the state of the art in terms of compression ratio, and practical tools. Those are quite different things.