It may be the way I use it, but qwen3-coder (30b with ollama) is actually helping me with real world tasks. Its a bit worse than big models for the way I use it, but absolutely useful. I do use ai tools with very specific instructions though, like file paths, line numbers if I can, and specific direction about what to do, my own tools, etc. so that may be why I don't see such a huge difference from big models.
I should try Kimi K2 too.
It has everything to do with the way you use it. And the biggest difference is how fast the model/service can process context. Everything is context. It's the difference between you iterating on an LLM boosted goal for an hour vs 5 minutes. If your workflow involves chatting with an LLM and manually passing chunks, and manually retrieving that response, and manually inserting it, and manually testing....
You get the picture. Sure, even last year's local LLM will do well in capable hands in that scenario.
Now try pushing over 100,000 tokens in a single call, every call, in an automated process. I'm talking the type of workflows where you push over a million tokens in a few minutes, over several steps.
That's where the moat, no, the chasm, between local setups and a public API lies.
No one who does serious work "chats" with an LLM. They trigger workflows where "agents" chew on a complex problem for several minutes.
That's where local models fold.
You'll see good results, Kimi is basically a micro dosing Sonnet lol. V v v reliable tool calls, but, because it's micro dosing, you don't wanna use it for implementing OAuth, maybe adding comments or strict direction (i.e. a series of text mutations)