There are some hurdles preventing that flow from achieving reproducible builds. As the bad guys get more sophisticated, it's going to become more and more important that one party can say "we trust this build hash" and a separate party to say "us too".

That's not going to work if both parties get different hashes when they build the image, which won't happen as long as file modification timestamps (and other such hazards) are part of what gets hashed.

Recent versions of buildkit have added support for SOURCE_DATE_EPOC. I've been making the images reproducible before that with my own tooling, regctl image mod [1] to backdate the timestamps.

It's not just the timestamps you need to worry about. Tar needs to be consistent with the uid vs username, gzip compression depends on implementations and settings, and the json encoding can vary by implementation.

And all this assumes the commands being run are reproducible themselves. One issue I encountered there was how alpine tracks their package install state from apk, which is a tar file that includes timestamps. There are also timestamps in logs. Not to mention installing packages needs to pin those package versions.

All of this is hard, and the Dockerfile didn't make it easy, but it is possible. With the right tools installed, reproducing my own images has a documented process [2].

[1]: https://regclient.org/cli/regctl/image/mod/

[2]: https://regclient.org/install/#reproducible-builds

Does any of that matter if you’re not auditing the packages you install?

I’m more concerned about sources being poisoned over the build processes. Xz is a great example of this.