Hacker News

hyghjiyhu 2 days ago [ - ]

I think one important factor you missed to account for is frameshifting. Compression algorithms work on bytes - 8 bits. Imagine that you have the exact same sequence but they occur at different offsets mod 4. Then your encoding will give completely different results, and the compression algorithm will be unable to make use of the repetition.

dwattttt 2 days ago [ - ]

I was actually under the impression compression algorithms tend to work over a bitstream, but I can't entirely confirm that.

Sesse__ a day ago [ - ]

Some are bit-by-bit (e.g. the PPM family of compressors[1]), but the normal input granularity for most compressors is a byte. (There are even specialized ones that work on e.g. 32 bits at a time.)

[1] Many of the context models in a typical PPM compressor will be byte-by-byte, so even that isn't fully clear-cut.

bede 17 hours ago [ - ]

A Zstd maintainer clarified this: https://news.ycombinator.com/item?id=45251544

> Ultimately, Zstd is a byte-oriented compressor that doesn't understand the semantics of the data it compresses

vintermann a day ago [ - ]

They output a bitstream, yeah but I don't know of anything general purpose which effectively consumes anything smaller than bytes (unless you count various specialized handlers in general-purpose compression algorithms, e.g. to deal with long lists of floats)