Whatever Claude Code is doing in the client/prompting is making much better use of 3.7 than any other client I'm using that also uses 3.7. This is especially true for when you bump up against context limits; it can successfully resume with a context reset about 90% of the time. MCP Commander [0] was built almost 100% using Claude Code and pretty light intervention. I immediately felt the difference in friction when using Codex.

I also spent a couple hours picking apart Codex with the goal of adding Sonnet 3.7 support (almost there). The actual agent loop they're using is very simple. Not to say that's a bad thing, but they're offloading all planning and workflow execution to the agent itself. That's probably the right end state to shoot for long-term, but given the current state of these models I've had much better success offloading task tracking to some other thing - even if that thing is just a markdown checklist. (I wrote about my experience [1] building AI Agents last year.)

[0]: https://mcpcommander.com/

[1]: https://mg.dev/lessons-learned-building-ai-agents/