Can’t say I'm a fan of packaging models as docker images. Feels forced - a solution in search of a problem.
The existing stack - a server and model file - works just fine. There doesn’t seem to be a need to jam an abstraction layer in there. The core problem docker solves just isn’t there
(disclaimer: I'm leading the Docker Model Runner team at Docker)
We are not packaging models as Docker images, since indeed that is the wrong fit and comes with all kinds of technical problems. It also feels wrong to pure package data (which models are) into an image, which generally expects to be a runnable artifact.
That's why we decided to use OCI Artifacts, and specify our own OCI Artifact subset that is better suited for the use case. The spec and implementation is OSS, you can check it out here: https://github.com/docker/model-spec
> GPU acceleration on Apple silicon
There is at least one benefit. I'd be interested to see what their security model is.
Is this really a Docker feature, though? llama.cpp provides acceleration on Apple hardware, I guess you could create a Docker image with llama.cpp and an LLLM model and have mostly this feature.
Unfortunately not, since the container won't have access to the Apple silicon GPU. That's why in our architecture, we have to run llama.cpp as a host process and wire it up with the rest of the Docker Desktop architecture, to make it easily accessible from containers.