I don't think tools like Claude are there yet, but I already trust GPT-5 Pro to be more diligent about catching bugs in software than me, even when I am trying to be very careful. I expect even just using these tools to help review existing Excel spreadsheets could lead to a significant boost in quality if software is any guide (and Excel spreadsheets seem even worse than software when it comes to errors).

That said, Claude is still quite behind GPT-5 in its ability to review code, and so I'm not sure how much to expect from Sonnet 4.5 in this new domain. OpenAI could probably do better.

> That said, Claude is still quite behind GPT-5 in its ability to review code, and so I'm not sure how much to expect from Sonnet 4.5 in this new domain. OpenAI could probably do better.

It’s always interesting to see others opinions as it’s still so variable and “vibe” based. Personally, for my use, the idea that any GPT-5 model is superior to Claude just doesn’t resonate - and I use both regularly for similar tasks.

I also find the subjective nature of these models interesting, but in this case the difference in my experiences between Sonnet 4.5 and GPT-5 Codex, and especially GPT-5 Pro, for code review is pretty stark. GPT-5 is consistently much better at hard logic problems, which code review often involves.

I have had GPT-5 point out dozens of complex bugs to me. Often in these cases I will try to see if other models can spot the same problems, and Gemini has occasionally but the Claude models never have (using Opus 4, 4.1, and Sonnet 4.5). These are bugs like complex race conditions or deadlocks that involve complex interactions between different parts of the codebase. GPT-5 and Gemini can spot these types of bugs with a decent accuracy, while I’ve never had Claude point out a bug like this.

If you haven’t tried it, I would try the codex /review feature and compare its results to asking Sonnet to do a review. For me, the difference is very clear for code review. For actual coding tasks, both models are much more varied, but for code review I’ve never had an instance where Claude pointed out a serious bug that GPT-5 missed. And I use these tools for code review all the time.

I've noticed something similar. I've been working on some concurrency libraries for elixir and Claude constantly gets things wrong, but GPT5 can recognize the techniques I'm using and the tradeoffs.

Try the TypeScript codex CLI with the gpt-5-codex model with reasoning always set to high, or GPT-5 Pro with max reasoning. Both are currently undeniably better than Claude Opus 4.1 or Sonnet 4.5 (max reasoning or otherwise) for all code-related tasks. Much slower but more reliable and more intelligent.

I've been a Claude Code fanboy for many months but OpenAI simply won this leg of the race, for now.

Same. I switched from sonnet 4 when it was out to codex. Went back to try sonnet 4.5 and it really hates to work for longer than like 5 minutes at a time

Codex meanwhile seems to be smarter and plugs away at a massive todo list for like 2 hours

[deleted]