Hacker News

Ollama has a really good perk in that it makes it trivial which model is loaded and unloaded from the GPU. So if you're using a frontend like librechat or openwebui, then switching models is as easy as picking from the drop down without having to fiddle with the command line.