Hi everyone, we're the maintainers.
We're rebooting the model-runner community and wanted to share what we've been up to and where we're headed.
When we first built this, the idea was simple: make running local models as easy as running containers. You get a consistent interface to download and run models from different backends (llama.cpp being a key one) and can even transport them using familiar OCI registries like Docker Hub.
Recently, we've invested a lot of effort into making it a true community project. A few highlights:
- The project is now a monorepo, making it much easier for new contributors to find their way around.
- We've added Vulkan support to open things up for AMD and other non-NVIDIA GPUs.
- We made sure we have day-0 support for the latest NVIDIA DGX hardware.