> Since you're on HN, I'd recommend skipping Ollama and LMStudio.

I disagree. With Ollama I can set up my desktop as an LLM server, interact with it over WiFi from any other device, and let Ollama switch seamlessly between models as I want to swap. Unless something has changed recently, with llama.cpp's CLI you still have to shut it down and restart it with a different command line flag in order to switch models even when run in server mode.

That kind of overhead gets in the way of experimentation and can also limit applications: there are some little apps I've built that rely on being able to quickly swap between a 1B and an 8B or 30B model by just changing the model parameter in the web request.

llamacpp can set up REST server with OpenAI API so you can get many front-end LLM apps to talk to it the same way they talk to ChatGPT, Claude, etc. And you can connect to that machine from another one on the same network through whatever port you set it to. See llamacpp-server.

When you get Ollama to "switch seamlessly" between models it still simply reloads a different model with llamacpp which is what it's based on.

I prefer llamacpp because doing things "seamlessly" obscures the way things work behind the scenes, which is what I want to learn and play with.

Also, and I'm not sure if it's the case anymore but it used to be, when llamacpp gets adjusted to work with the latest model, sometimes it takes them a bit to update the Python API which is what Ollama is using. It was the case with one of the LlaMas, forget which one, where people said "oh yeah don't try this model with Ollama, they're waiting on llamacpp folks to update llama-cpp-python to bring the latest changes from llamacpp, and once they do, Ollama will bring the latest into their app and we'll be up and running. Be patient."

> I prefer llamacpp because doing things "seamlessly" obscures the way things work behind the scenes, which is what I want to learn and play with.

And that's a fine choice, but some of us actually just want to hack with the models not hack on them. Ollama is great for that, and SSHing into my server to reboot the process every time I want to change models just doesn't work for me.

I'd rather wait a few weeks for the newest model and be able to alternate easily than stay on the bleeding edge and sacrifice that.