Maybe check out Docker Model Runner -- it's built on llama.cpp (in a good way -- not like Ollama) and handles I think most of what you're looking for?
https://www.docker.com/blog/run-llms-locally/
As far as how to find good models to run locally, I found this site recently, and I liked the data it provides: