Anyone actually doing this? DeepSeek-R1 32b ollama can't run on an RTX 4090 and the 17b is nowhere near as good at coding as OpenAI or Claude models.

I specified autocomplete, I'm not running a whole model asking it to build something and await an output.

DeepSeek-coder-v2 is fine for this, I occasionally use a smaller Qwen3 (I forget exactly which at the moment... Set and forget) for some larger queries about code, given my fairly light used cases and pretty small contexts it works well enough for me