FWIW, Ollama already does most of this:
- Cross-platform
- Sets up a local API server
The tradeoff is a somewhat higher learning curve, since you need to manually browse the model library and choose the model/quantization that best fit your workflow and hardware. OTOH, it's also open-source unlike LMStudio which is proprietary.
I assumed from the name that it only ran llama-derived models, rather than whatever is available at huggingface. Is that not the case?
No, they have quite a broad list of models: https://ollama.com/search
[edit] Oh and apparently you can also directly run some models directly from HuggingFace: https://huggingface.co/docs/hub/ollama