Hacker News

Yes at this point it's starting to become almost a matter of how much you like the model's personality since they're all fairly decent. OP just has to start downloading and trying them out. With 16GB one can do partial DDR5 offloading with llama.cpp and run anything up to about 30B (even dense) or even more at a "reasonable" speed for chat purposes. Especially with tensor offload.

I wouldn't count Qwen as that much of a conversationalist though. Mistral Nemo and Small are pretty decent. All of Llama 3.X are still very good models even by today's standards. Gemma 3s are great but a bit unhinged. And of course QwQ when you need GPT4 at home. And probably lots of others I'm forgetting.