It's pretty funny, i'm a $200/m Claude subscriber and i've had little need to use anything else. However the more Claude has been restricting my workflow (notably around the recent IDE/-p usage change) the more i've been wanting to go elsehwere.
I'm concerned since i really want SOTA reasoning, but DeepSeek still has me interested.
> I'm concerned since i really want SOTA reasoning
I think you should give other models a try and see how much they differ from SOTA models. I did this and realized, even Qwen-2.5-Max was enough. I am sure even Claude Sonnet 3.5 is enough for things I play around with. I am not really striving for fields medal in Mathematics.
That's fair, neither am i - i do tend to work in large, complex, full of legacy decision based codebases. Eg i have access to Sonnet (of course), but i choose to solely work in Opus because i find its output reads better, analyzes better, etc.
The "cost" is dumb models is just so high for me. Eg every bad decision they make increases my frustration quite a bit. Despite putting a lot of effort into my workflow to help reduce the number of decisions they make, they always will. So my hedge is always against that.. trying to reduce how insane they can be heh.
I gave a fairly complex reverse engineering task to DS-4 xhigh and GPT-5.5 xhigh today.
After about 6 hours, both ultimately failed to fully RE, however, there were some drastic differences:
DS stopped every 30 minutes or so, saying it did full RE and it should all work now, while in fact, it didn't complete even 1% of it. It also looked for shortcuts again and again, despite me prompting heavily that the specific shortcut may not be used. It was a complete and utter failure.
GPT-5.5, on the other hand, blew me away. It just did the right things, didn't jump to next steps until it was sure it completed the initial layers and had a full understanding of what's required. The only time I prompted it during the 6 hours was when I saw it going in the right direction and I could nudge it slightly towards an even better way. I never felt I was fighting it. Okay, maybe a little bit - after compaction, it sometimes would go on a "no I'm not helping you with reverse engineering" tangent, but it would resolve in a clean session.
I cancelled my Claude subscription a month ago, so I haven't tested that, but DeepSeek has reminded me a lot of how I worked with Opus 4.6/4.7. Which perhaps could be a positive sign to some, but GPT-5.5 showed me that the way claude/ds work is just way too annoying.
> DS stopped every 30 minutes or so, saying it did full RE and it should all work now, while in fact, it didn't complete even 1% of it. It also looked for shortcuts again and again, despite me prompting heavily that the specific shortcut may not be used. It was a complete and utter failure.
This is my experience with non-SOTA models across the board. When you try them on little tasks and they work it feels amazing, but then you go deeper and you're back to going in loops and fighting the model for hours.
Switching back to a SOTA model immediately yields progress again.
When I read all of the comments from people saying they can't tell a difference between Opus and <insert open weight model here> I don't know if they haven't really used it much yet, or if they're just not doing anything complicated.
Did you read the OP when he's exactly chiding the model you're glazing?
Did you intentionally miss the point of my comment? Substitute Opus for GPT-5.5 if you will. I use both as well as locally hosted models using some of your branches, even.
Fair enough. I agree with you - although DS4 Pro is a GPT 5 class model which scores 46% on ARC-AGI-2[^1]. It's behind by maybe 9 months, I think it's still good enough for a lot of complex tasks as well. They definitely need to work on a "just fucking works" harness like CC/Codex. Also thanks!
[^1] https://www.nist.gov/news-events/news/2026/05/caisi-evaluati...
What you’re experiencing is the difference in model intelligence. Most models can seem pretty good at simple stuff over short time horizons. Complex work requires that more intelligence be stuffed into those trillion-dimensional spaces.
The GPT models are heavily biased to a more incremental, empirical, evidence based approach. Sometimes to a fault. I prefer them for this reason, but it requires coaxing or strategic use of /goal to break it out if its highly staged, one piece at a time, approach.. if you don't like it.
I suspect for people doing more... website ... type development, the more "yeet this into existence" style of Opus feels preferable.
With Claude I was constantly jamming my finger on the escape key "wait, you did what?! based on what proof?!"
You make it sound as if Codex is for people who know what they want and Claude Code is for people who don’t know what they’re doing.
I was trying to not sound that biased, but ok ;-)
> i've been wanting to go elsehwere.
There's always the option of using Anthropic's models for some tasks like planning and then just hand over the implementation task to something like DeepSeek. Across different tools, a Markdown plan works pretty okay. That's what I'm planning to do if I go from the 5x Max subscription down to the Pro.
I am also writing a launcher that makes using 3rd party providers with Claude Code easy (https://ccode.kronis.dev) and I already have a local proxy up and running, just not dynamic model switching yet. Though it shouldn't be too hard to add, will probably be there within a week or two, depending on my schedule.
I don't think it's wise to leave Anthropic altogether because their models are great (and a subscription gives you features like Remote Control which I like), but switching tiers and maybe saving a bit of money seems viable! On the other hand, you do need a quality baseline, because I remember using Cerebras with GLM 4.6 way back and there was a bit too much slop.
If you want SOTA reasoning you should be using GPT 5.5 Pro.
This is fair, but i've found the different models to have different moods and require different interactions to get them to stick to just the specific edits i ask for, etc.
I used to surf the three big players frequently and got really tired of the effort needed to steer some models. In the end i ended up sticking with Claude because it required less steering effort. While not strictly reasoning, a models ability to follow clear directions consistently is something i'd consider part of its SOTA capabilities.
Eventually i just tired of exploring. I just want stability.
Which ironically is why i'm thinking about moving from Claude. The very basic IDE/-p usage getting removed from my plan is a UX stability issue. I'm trying to progressively improve my workflows and efficiency, not have to establish a new foundation anytime something shifts. Quite frustrating.
Codex has only GPT 5.5
You should definitely stick to the $200 plan, and not try the $10 coding plans with open weight models and higher limits. Anthropic needs your money to stay solvent, and you'll sleep better knowing you're using SOTA.
(Zero reason to defend Anthropic.)
I’ve gone that route. I really wanted to stop using Claude, but Deepseek v4 Pro and Kimi 2.6 didn’t do the job. For a lot of coding tasks or well-specced plans, maybe… but then that’s a plan made by Opus anyway.
Even Sonnet is sometimes not worth the trouble. Opus is very thorough and reviews its own mistakes quite well. Catches a lot of edge cases.
I’m not saying we shouldn’t try other things — I did! —, but it’s more or less okay that people just like Claude Code subscriptions? The back and forth I had with Kimi on a small feature came out to ~1.8€, which is 10% of my Claude subscription each month. And that was a single session. CC with Serena uses tokens fairly well.
/advisor is like the old /opusplan mode but for running tasks not just pre-planning. It can work nicely with Sonnet as the main agent and escalates to Opus as needed.
Advisor-mode has been very helpful indeed, I can now plan with Opus, have Haiku code, and escalate back to Opus for review. It’s a decent flow for Pro subscribers trying to max their usage. But as I’ve said above, sometimes it’s not worth it: Sonnet and Haiku can produce stuff that’s not worth reviewing.
The world would be better long-term if we chise tonfund open models instead however.
If you think short-term and only about yourself, paying for SOTA regardless of how many military contracts the lab has is the best thing, but paying for open models is both better ethically, and for a future where AI belongs to everyone and not just to Altman et al.