Hacker News

llamacpp can set up REST server with OpenAI API so you can get many front-end LLM apps to talk to it the same way they talk to ChatGPT, Claude, etc. And you can connect to that machine from another one on the same network through whatever port you set it to. See llamacpp-server.

When you get Ollama to "switch seamlessly" between models it still simply reloads a different model with llamacpp which is what it's based on.

I prefer llamacpp because doing things "seamlessly" obscures the way things work behind the scenes, which is what I want to learn and play with.

Also, and I'm not sure if it's the case anymore but it used to be, when llamacpp gets adjusted to work with the latest model, sometimes it takes them a bit to update the Python API which is what Ollama is using. It was the case with one of the LlaMas, forget which one, where people said "oh yeah don't try this model with Ollama, they're waiting on llamacpp folks to update llama-cpp-python to bring the latest changes from llamacpp, and once they do, Ollama will bring the latest into their app and we'll be up and running. Be patient."

lolinder 2 days ago [ - ]

> I prefer llamacpp because doing things "seamlessly" obscures the way things work behind the scenes, which is what I want to learn and play with.

And that's a fine choice, but some of us actually just want to hack with the models not hack on them. Ollama is great for that, and SSHing into my server to reboot the process every time I want to change models just doesn't work for me.

I'd rather wait a few weeks for the newest model and be able to alternate easily than stay on the bleeding edge and sacrifice that.