I was using Claude until they banned Opencode, and now use GPT at my day job. I've been using Deepseek through Opencode Go on the $10/mo plan, and I honestly can't really tell much difference. Its just as capable, and makes the same kinds of dumb mistakes and the other two have been making since March. For the price, I'm more than happy with it.
It's interesting. 95% of time you don't need the extra 5% rigor that frontier models provide to you compared to the 10-100x cheaper Chinese equivalents.
The remaining 5% of time you get a big boost for your high-reasoning problem solving needs and evade a lot of pain. Now, I just need to be able to predict accurately when I need this extra 5% and when not :)
the extra 5% time you will need to help AI with multiple turns and information it needed. These 5% time reasoning rarely is enough to finish the task. i.e. 5% time AI is just not enough to complete the task without a lot help.
I find the trick I use is to get the model to come up with a phased plan, and review it. If I spot anything that seems dumb, I give direction on the way it should be done. And once you finalize that, the model can run through the steps fairly reliably. As long as you're intentionally making all the big decisions, things tend to work out well.
I have both subscriptions and I definitely feel gpt is better and more consistent, but when I run out of limits I don't miss it too much
That's the whole point. The tool you have vs. the expensive tools you don't have because they're too expensive.
I don't feel like paying 100 times the price for a 1-5% better tool.
The cutting edge of LLM-based software engineering seems to be all about how to harness the "good enough" pseudo-intelligence of consumer-level affordable models into achieving practical results, through iterations, tests, harnesses, etc. And these models are getting smarter every month, including open-weight models people can run on their own machines and servers. We're not seeing the kind of leaps as often as before, but it hasn't plateau'ed yet, the models are getting better all the time.
It implies that eventually open-weight models like DeepSeek, which are self-hostable locally or on premises, will become good enough for more people and businesses, in terms of productivity gains versus cost. Consumer hardware will adapt to that demand, making it even more affordable and within reach.
Not sure how that speculation fits with the billions of dollars of investment that AI companies will need to convert to profit somehow.
I am not sure what I am doing wrong then. I am using claude the last 7 months and from time to time try other models like deepseek, kimi etc. Nothing can come even close to it. Claude is almost evrytime (99.99%) one shot.
In my experience, there is a very specific use case of one-shotting complex, long tasks with relatively vague or incomplete descriptions where Opus does substantially better than all other models I've tried, including GPT 5.5, GLM 5.1 and DS4. It seems to be better at inferring unstated requirements and creating a complete, working, reasonably well-designed solution.
However, that's probably not how most professional developers use LLMs. I tend to give well-specified, more constrained tasks, and for those, I find that Opus performs worse than other models precisely because it tends to infer unstated requirements and do things I didn't want it to do. In this situation, GPT 5.5 works better for me because it only and precisely does what I ask it to.
Same here. Claude isn't perfect. It still makes a lot of mistakes. But whenever I try GPT-5.5 it's ten times worse, and Claude just has to clean up GPT's mess.
You're obviously not doing anything wrong if it works for you.
It worked for me too, for months, when I was working on trivial web projects.
Around February of this year it got lobotomized and I quit my subscription end of march.
I am not going back.