Hacker News

Why would anyone use Ollama at all (aside from obvious reasons one can look up online) - llama.cpp used directly, without this wrapper is faster.

Basically one has two real choices for local LLMs: llama.cpp (if single user) or vLLM (if multi-user/enterprise).