Hacker News

embedding-shape a day ago [ - ]

You're not sharing what quantization you're using, in my experience, anything below Q8 and less than ~30B tends to basically be useless locally, at least for what you typically use codex et al for, I'm sure it works for very simple prompts.

But as soon as you go below Q8, the models get stuck in repeating loops, get the tool calling syntax wrong or just starts outputting gibberish after a short while.

gchamonlive a day ago [ - ]

will do that in an edit to the post

embedding-shape a day ago [ - ]

Sure, waiting :)

In the meantime, Ollama seems to default to "Q4_K_M" which is barely usable for anything, and really won't be useful for agentic coding, the quantization level is just too low. Not sure why Ollama defaults to basically unusable quantizations, but that train left a long time ago, they're more interesting in people thinking they can run stuff, rather than flagging things up front, and been since day 1.

2ndorderthought a day ago [ - ]

Ollama is definitely not the way to go once you have an interest beyond "how quickly can I run a new LLM" rather then "how do I use a local llm to do things in a remotely optimal way"

gchamonlive a day ago [ - ]

I'm currently giving club3090 a try, it seems to have lots of pre-configured setups depending on the workflow. I'm trying vllm first, then with llama.cpp.