Working on my codebase (~100KLoC across multiple Python modules) I felt that Fable was head and shoulders above 4.x series. It was just relentless and always hell bent on testing and proving its own work. It just tore through problems like an animal. I never seen that behaviour in 4.5-4.8. I can't speak for OpenAI models as I don't use them but Fable was in a different league. Especially when tasked with long horizon goals that involved reasoning at a high and low level to solve the task.

I have had the same experience. I can't believe that people couldn't tell the difference.

I think a lot of users likely use these models on small hobby projects and not some convoluted enterprise code base. When you're making yet another Space Invaders clone it really won't show much difference. Messy, complex code bases with layers of cruft from decades of patching - that's what separates the model boys from men.

Yeah, and its browser usage on tough web apps/sites was also amazing. This is one of the cases where it is easy to tell a difference. It was figuring out very effectively how to find right elements whereas with previous LLMs I had to constantly babysit and unblock them with browser usage.

I used codex 5.5 and Claude. I pay for Claude from my pocket. I use Codex at work. I can confidently say Codex 5.5 high is much better in going through long code bases (couple of millions of lines of code) vs Claude Fable/Opus which does only what is been told. while codex covers all sorts of edge cases. Frankly, I am not going to miss a thing if they stopped Fable.

Was gonna say the same thing. GP's description of Fable sounds a lot like my experience switching from Claude Code Opus-4.8 to Codex GPT-5.5.