I am on the opposite camp. Open models are starting to perform better. GPT 5.5 keeps on messing things up.

On the contrary, pi + glm + DeepSeek… bliss.

Fable was a different kind of beast though. Rip.

Every time I use opus these days I go shut up... you are not fable.. Hard to imagine how just three days with it changed how I saw LLM use.

I really don't feel this way. Seemed pretty similar to me, noticeably better, but marginally. What am I missing?

It may depend on your specific workload. E.g. for regular webdev work Opus is more than adequate, for heavy duty data analysis, for experimental stuff and for complex systems it was night and day.

I had only a few places where I did spot a difference but that difference was significant and I can imagine where people would be amazed.

It's interesting, I tried a decent amount of "heavy duty data analysis", and found it pretty similar. But a lot of what I did was about it finding and cobbling together the right things from our existing library of domain specific tooling, which opus is already good at. But perhaps it would have impressed me more if it were starting from zero.

What kind of "experimental stuff and complex systems" did you try that it excelled at?

Nothing. It had marginal gains. People just romanticize it cause it's gone.

Yes, I've just come to the end of implementing all the planning I did while Fable was available. And nothing now comes close to creating plans that could be coded and just worked like it did.

On a large C codebase, Claude hallucinates constantly, and GPT 5.5 gets there are with a lot of help, but still gets things wrong.

I'm reluctantly starting to feel grateful that I went camping right over the window that Fable was out.

Same.

Yeah, Opus/GPT need multiple rounds of reviews from each other to get to clean auto review. Fable was like, it is done and indeed… crickets in bot comments. ‘No issues’ galore.

I wonder if this will hold as other models with different biases achieve parity.

GPT-5.5 has been really hard to beat imho. I've spent $$$ on Opus, Deepseek v4 Pro and recently started to dogfood GLM-5.2 (which is not bad) but I cannot really trust any of them (almost blind) like I can trust GPT-5.5. It gives me tremendous confidence. I cannot say the same for any of the others I mentioned.

Ditto on GLM 5.2 + DeepSeek V4 Flash combo.

For most important work (complex, cross-domain inquiries etc.), I still rely on Codex GPT 5.5 though.

How are you running glm and deepseek? Local or hosted? If the latter, where do you run it?

OpenCode has a $10/mo sub that includes both of those

how much does your setup cost you? just curious

>> I am on the opposite camp. Open models are starting to perform better. GPT 5.5 keeps on messing things up.

I'm working in a 600k+ LoC codebase that has complex domain-specific logic and lots of moving parts. I find that Codex 5.5 is pretty good at surgical fixes, but does not go out of its way to explore and figure out what those surgical fixes might break. So I only use it to work on parts of the system that are pretty isolated from everything else so that risk of regression is small.

I'm trying not to be the "you're holding it wrong" guy, but ... have you just tried telling it to explore the codebase for things it might break?