I'm struggling with the caching right now. I'm trying to switch from the Github actions to just running stuff in containers, and it works. Except for caching.

Buildkit from Docker is just a pure bullshit design. Instead of the elegant layer-based system, there's now two daemons that fling around TAR files. And for no real reason that I can discern. But the worst thing is that the caching is just plain broken.

Buildkit can be very efficient at caching, but you need to design your image build around it. Once any step encounters a cache miss, all remaining steps will too.

I'd also avoid loading the result back into the docker daemon unless you really need it there. Buildkit can output directly to a registry, or an OCI Layout, each of which will maintain the image digest and support multi-platform images (admittedly, those problems go away with the containerd storage changes happening, but it's still an additional export/import that can be skipped).

All that said, I think caching is often the wrong goal. Personally, I want reproducible builds, and those should bypass any cache to verify each step always has the same output. Also, when saving the cache, every build caches every step, even if they aren't used in future builds. As a result, for my own projects, the net result of adding a cache could be slower builds.

Instead of catching the image build steps, I think where we should be spending a lot more effort is in creating local proxies of upstream dependencies, removing the network overhead of pulling dependencies on every build. Compute intensive build steps would still be slow, but a significant number of image builds could be sped up with a proxy at the CI server level without tuning builds individually.

> Buildkit can be very efficient at caching, but you need to design your image build around it.

Well, that's what I've been trying to do. And failing, because it simply doesn't work.

> I'd also avoid loading the result back into the docker daemon unless you really need it there.

I need Docker to provide me a reproducible environment to run lints, inspections, UI tests and so on. These images are quite massive. And because caching in Docker is broken, they were getting rebuilt every time we did a push.

Well. I switched to Podman and podman-compose. Now they do get cached, and the build time is within ~1 min with the help of the GHA cache.

And yes, my deployment builds are produced without any caching.

I went down this rabbit hole before, you have to ignore all the recommended approaches. The real solution is to have a build server with a global Docker install and a script to prune cache when the disk usage goes above a certain percentage. Cache is local and instant. Pushing and pulling cache images is an insane solution.

What you are describing is basically remote buildkitd. That allows all of your docker builds to share a big cache. The cache-to/cache-from approach is of limited usefulness.

The layers are tar files, I’m confused what behavior you actually want that isn’t supported.

The original Docker (and the current Podman) created each layer as an overlay filesystem. So each layer was essentially an ephemeral container. If a build failed, you could actually just run the last successful layer with a shell and see what's wrong.

More importantly, the layers were represented as directories on the host system. So when you wanted to run something in the final container, Docker just needed to reassemble it.

Buildkit has broken all of it. Now building is done, essentially, in a separate system, the "docker buildx" command talks with it over a socket. It transmits the context, and gets the result back as an OCI image that it then needs to unpack.

This is an entirely useless step. It also breaks caching all the time. If you build two images that differ only slightly, the host still gets two full OCI artifacts, even if two containers share most of the layers.

It looks like their Bazel infrastructure optimized it by moving caching down to the file level.

Buildkit didn't break anything here except that it each individual build step is no longer exposed as a runnable image in docker. That was unfortunate, but you can actually have buildkit run a command in that filesystem these days, and buildx now even exposes a DAP interface.

Buldkit is far more efficient than the old model.

Buildkit is still a separate system, unlike the old builder. So you get that extra step of importing the result back.

And since it's a separate system, there are also these strange limitations. For example, I can't just cache pre-built images in an NFS directory and then just push them into the Buildkit context. There's simply no command for it. Buildkit can only pull them from a registry.

> Buldkit is far more efficient than the old model.

I've yet to see it work faster than podman+buildah. And it's also just plain buggy. Caching for multi-stage and/or parallel builds has been broken since the beginning. The Docker team just ignores it and closes the bugs: https://github.com/moby/buildkit/issues/1981 https://github.com/moby/buildkit/issues/2274 https://github.com/moby/buildkit/issues/2279

I understand why. I tried to debug it, and simply getting it running under a debugger is an adventure.

So far, I found that switching to podman+podman-compose is a better solution. At least my brain is good enough to understand them completely, and contribute fixes if needed.

Buildkit is integrated into dockerd the same way the old builder was. If you want a newer Buildkit you'll need to run it separately of course.

I'm not quite sure I understand what you are trying to do with nfs there. But you can definitely export the cache to a local filesystem and import it with cache-from. You can also provide named contexts.

"Buildkit can only pull them from a registry" is just plain false.

> Buildkit is integrated into dockerd the same way the old builder was. If you want a newer Buildkit you'll need to run it separately of course.

I don't think that the older builder created special containers for itself?

> I'm not quite sure I understand what you are trying to do with nfs there. But you can definitely export the cache to a local filesystem and import it with cache-from.

Which is dog-slow, because it squirts the cache through a socket. I have an NFS disk that can be used to cache the data directly. This was just one of the attempts to make it go faster.

> You can also provide named contexts.

Which can only refer to images that are built inside this particular buildkit or are pullable from a repo.

This is really all I want, a way to quickly reused the previous state saved in some format in Github Cache, NFS, or other storage.

> I don't think that the older builder created special containers for itself?

Buildkit doesn't create special containers for itself? It's literally a service integrated into dockerd.

> Which can only refer to images that are built inside this particular buildkit or are pullable from a repo.

No, it supports anything Buildkit can fetch: git, http, client dir... for that matter the client itself can shim that to be whatever it wants.

> This is really all I want, a way to quickly reused the previous state saved in some format in Github Cache, NFS, or other storage.

You can cache to GitHub actions cache, S3, az blob, gcs, registries, or export to the client.

Anything you want to stick it on is going to require copying the data, and yeah that's going to be expensive.

> Buildkit doesn't create special containers for itself? It's literally a service integrated into dockerd.

No, it's not. It's a utility container that is hidden from the normal "docker ps". You can see it easily when you use docker-compose with podman.

The easiest way to see it in regular Docker is to create a simple Dockerfile with 'RUN sleep 1000' at the end and start building it. Then enter the Docker host ("docker run -it --rm --privileged --pid=host justincormack/nsenter1") and do 'mount' to see the mounts.

You'll see that buildkit will have its own overlay tree ('/var/lib/docker/buildkit/containerd-overlayfs') and the executor will have its own separate branch too. However, they do share the layers. Now wait for the container to complete building and run it.

You'll see that the running container uses an entirely _different_ set of layers. There is no reuse of layers between the buildkit and the running image.

Yes, the Docker buildkit is technically a daemon that is co-located with dockerd and just runs in its own cgroup tree. But it might as well be remote, because the resulting image runs in a completely different environment.

And the way the image is transferred from buildkit is through the containerd. Which is another separate container in the "moby" namespace.

> No, it supports anything Buildkit can fetch: git, http, client dir... for that matter the client itself can shim that to be whatever it wants.

Any examples?

> You can cache to GitHub actions cache, S3, az blob, gcs, registries, or export to the client.

Go on, try it. Here's a minimal repro: https://gist.github.com/Cyberax/61e6b419cd338ae7c3a7c7098abe...

First, you can build the base image, with the GHA or registry cache. It works. But the `proto` stage will never use cache. The "base" image is supplied through an additional context.

If by "utility" container you mean the containers aren't managed under the same stack, that is true. Buildkit, at least prior to docker 29, executes runc directly. It is still using the same storage backend, though there is a shim involved to convert docker's (now deprecated) graph drivers to containerd's snapshotter interface which is what Buildkit speaks. That's why there is a different tree. As of docker 29, containerd's storage is used by default. I can't recall if this used containerd to execute containers or just storage.

> > No, it supports anything Buildkit can fetch: git, http, client dir... for that matter the client itself can shim that to be whatever it wants. > Any examples?

--build-context foo=https://github.com/example/repo.git

Then you can "FROM foo" or whatever you want to do with that context.

> First, you can build the base image, with the GHA or registry cache. It works. But the `proto` stage will never use cache. The "base" image is supplied through an additional context.

What are you expecting to cache here? Are you saying using an extra context like this is preventing it from using the cache?

Huh?

Each layer is a tarball.

So build your tarballs (concurrently!), and then add some metadata to make an image.

From your comment elsewhere it seems maybe you are expecting the docker build paradigm of running a container and snapshotting it at various stages.

That is messy and has a number of limitations — not the least of which is cross-compilation. Reproducibility being another. But in any case, that definitely not what these rules are trying to do.

I don't quite understand how it handles running binaries then. For example, I want to do `bash -c "ls -la /"`. How would it run this command? It needs to assemble the filesystem at this point in the build process.

I guess the answer for Bazel is "don't do this"? Docker handles cross-compilation by using emulators, btw.

> “don’t do this”

Yes. The Bazel way use to produce binaries, files, directories, and then create an image “directly” from these.

Much as you would create a JAR or ZIP or DEB.

This is (1) fast (2) small and (3) more importantly reproducible. Bazel users want their builds to produce artifacts that are exactly the same, for a number of reasons. Size is also nice…do you really need ls and dozens of other executables in your containerized service?

Most Docker users don’t care about reproducibility. They’ll apt-get install and get one version today and another version tomorrow.

Good? Bad? That’s a value judgement. But Bazel users have fundamentally different objectives.

> emulators

Yeah emulators is the Docker solution for producing images of different architectures.

Since Bazel doesn’t run commands as a running container, it skips that consideration entirely.

> Size is also nice…do you really need ls and dozens of other executables in your containerized service?

Yeah, I do. For debugging mostly :(

> Most Docker users don’t care about reproducibility. They’ll apt-get install and get one version today and another version tomorrow.

Ubuntu has daily snapshots. Not great, but works reasonably well. I tried going down the Nix route, but my team (well, and also myself) struggled with it.

I'd love to have fully bit-for-bit reproducible builds, but it's too complicated with the current tooling. Especially for something like mobile iOS apps (blergh).