+1 to this, I've found GPT/Codex models consistently stronger in engineering tasks (such as debugging complex, cross-systems issues, concurrency problems, etc).
I use both OpenAI and Anthropic models, though for different purposes, what surprises me is how underrated GPT still feels (or, alternatively, how overhyped Anthropic models can be) given how capable it is in these scenarios. There also seems to be relatively little recognition of this in the broader community (like your recent YouTube video). My guess is that demand skews toward general codegen rather than the kind of deep debugging and systems work where these differences really show.
Or rather, it’s hard to ask everyone to side-by-side compare both products on their use cases. So the choice really comes down to word-of-mouth even though their use cases may be better served by Codex.
It's surprising to me how much LLM "personality" seems to matter to people, more than actual capability.
I do turn to Anthropic for ideation and non-tech things. But I find little reason to use it over codex for engineering tasks. Sometimes for planning, but even there, 5.4 is more critical of my questionable ideas, and will often come up with simpler ways to do things (especially when prompted), which I appreciate.
And I don't do hard-tech things! I've chosen a b2b field where I can provide competent products for a niche that is underserved and where long term relationships matter, simply because I'm not some brilliant engineer who can completely reinvent how something is done. I'm not writing kernels or complex ML stacks. So I don't really understand what everyone is building where they don't see the limits of Opus. Maybe small greenfield projects with few users.
> It's surprising to me how much LLM "personality" seems to matter to people, more than actual capability. > I do turn to Anthropic for ideation and non-tech things. But I find little reason to use it over codex for engineering tasks. Sometimes for planning, but even there, 5.4 is more critical of my questionable ideas, and will often come up with simpler ways to do things (especially when prompted), which I appreciate.
Aren't you saying here that the LLM personality matters to you, too? Being critical of you is a personality attribute, not a capabilities one.
Not necessarily. Criticism is the analysis, evaluation, or judgment of the qualities of something. This is a matter of intellectual act. However, you could say that being habitually critical can be partly a result of "personality" or temperament.
(Of course, strictly speaking, LLMs have neither temperament, "personality", nor intellect, but we understand these terms are used in an analogical or figurative fashion.)
> I'm not some brilliant engineer who can completely reinvent how something is done
With an honest evaluation of your own capabilities you are already far above average. Also its hard to see the insane amount of work that often was necessary to invent the brilliant stuff and most people can not shit that out consistently.
I use codex for cleaning after cloude and it always finds so many bugs, some of them quite obvious.