Hacker News

Ollama[0] has a collection of models that are either already small or quantized/distilled, and come with hyperparameters that are pretty reasonable, and they make it easy to try them out. I recommend you install it and just try a bunch because they all have different "personalities", different strengths and weaknesses. My personal go-tos are:

Qwen3 family from Alibaba seem to be the best reasoning models that fit on local hardware right now. Reasoning models on local hardware are annoying in contexts where you just want an immediate response, but vastly outperform non-reasoning models on things where you want the model to be less naive/foolish.

Gemma3 from google is really good at intuition-oriented stuff, but with an obnoxious HR Boy Scout personality where you basically have to add "please don't add any disclaimers" to the system prompt for it to function. Like, just tell me how long you think this sprain will take to heal, I already know you are not a medical professional, jfc.

Devstral from Mistral performs the best on my command line utility where I describe the command I want and it executes that for me (e.g. give me a 1-liner to list the dotfiles in this folder and all subfolders that were created in the last month).

Nemo from Mistral, I have heard (but not tested) is really good for routing-type jobs, where you need something with to make a simple multiple-choice decision competently with low latency, and is easy to fine-tune if you want to get that sophisticated.

[0] https://ollama.com/search