Hacker News

Unfortunately not, since the container won't have access to the Apple silicon GPU. That's why in our architecture, we have to run llama.cpp as a host process and wire it up with the rest of the Docker Desktop architecture, to make it easily accessible from containers.