Is this really a Docker feature, though? llama.cpp provides acceleration on Apple hardware, I guess you could create a Docker image with llama.cpp and an LLLM model and have mostly this feature.
Unfortunately not, since the container won't have access to the Apple silicon GPU.
That's why in our architecture, we have to run llama.cpp as a host process and wire it up with the rest of the Docker Desktop architecture, to make it easily accessible from containers.
Is this really a Docker feature, though? llama.cpp provides acceleration on Apple hardware, I guess you could create a Docker image with llama.cpp and an LLLM model and have mostly this feature.
Unfortunately not, since the container won't have access to the Apple silicon GPU. That's why in our architecture, we have to run llama.cpp as a host process and wire it up with the rest of the Docker Desktop architecture, to make it easily accessible from containers.