Hacker News

Maybe check out Docker Model Runner -- it's built on llama.cpp (in a good way -- not like Ollama) and handles I think most of what you're looking for?

https://www.docker.com/blog/run-llms-locally/

As far as how to find good models to run locally, I found this site recently, and I liked the data it provides:

https://localclaw.io/