There are many approaches around this, the simplest being to treat bytes as tokens (cf: Google's ByT5[1]). Also, BLT[2] from Meta and ByteFormer[3] from Apple.
Transformers do this for any stream of tokens, those tokens can map to anything you want, and you will get lossy compression. Text produced by humans just happens to be dense, available, and a useful prior, but it is not intrinsically required. See 3D vision transformers for example.
It is not possible to compress arbitrary data. If the data is already compressed, or it is encrypted, or it is randomly generated, it cannot be compressed with any method. This is foundational information theory.
Whereas if we're talking about lossy compression (as is the person to whom you replied) we certainly can compress arbitrary data - almost as much as we want.
The hard question, then, is how much the decompressed output looks like the original.
There are many approaches around this, the simplest being to treat bytes as tokens (cf: Google's ByT5[1]). Also, BLT[2] from Meta and ByteFormer[3] from Apple.
[1]: https://arxiv.org/abs/2105.13626
[2]: https://arxiv.org/abs/2412.09871
[3]: https://arxiv.org/abs/2306.00238
Transformers do this for any stream of tokens, those tokens can map to anything you want, and you will get lossy compression. Text produced by humans just happens to be dense, available, and a useful prior, but it is not intrinsically required. See 3D vision transformers for example.
It is not possible to compress arbitrary data. If the data is already compressed, or it is encrypted, or it is randomly generated, it cannot be compressed with any method. This is foundational information theory.
https://en.wikipedia.org/wiki/Lossless_compression#Limitatio...
Whereas if we're talking about lossy compression (as is the person to whom you replied) we certainly can compress arbitrary data - almost as much as we want.
The hard question, then, is how much the decompressed output looks like the original.