Hacker News

If that's the purpose, couldn't you store the hash and throw away the compressed image?

(As others said, compression is deterministic for the same algorithm, parameters and input data)

Zstd for example only promises determinism on the same version of the library. I've personally seen the hashes mutate between pull and export. Things like tar padding also make a difference. Really, the thing to do is to hash on the _uncompressed_ data and let compression be a transport/registry detail. That's what I've done, at least.

mort96 3 days ago [ - ]

I didn't know that about zstd, that's a bit unfortunate.

Tar isn't related here though, we're talking about compression not archival formats

thaJeztah 3 days ago [ - ]

Yes, compression being part of the OCI image's digest was (in hindsight) a poor decision. _Technically_ OCI images allow uncompressed layers, and the layers could be included without compression (and transport compression to be used); this would allow layers to be fully reproducible. We explored some options to do this (and made some preparations; https://github.com/containerd/containerd/pull/8166), but also discovered that various implementations of registry clients didn't handle transport-compression correctly (https://github.com/distribution/distribution/pull/3754), which could result in client either pulling the full, uncompressed, content, or image validation failing.

a_t48 3 days ago [ - ]

For my registry fork/custom pull client I hash on the uncompressed content and store as compressed under the uncompressed digest. This lets me have my cake and eat it, too - compression free digests, smaller storage costs, be able to set consistent compression settings, have the ability to spend extra CPU to recompress on the backend without breaking hashes, etc. I control both pull client and registry, so it works.

cpuguy83 3 days ago [ - ]

The whole entire reason is compression is not deterministic across tooling.