Codex CLI had some huge upgrades in the past few months.
Before the GPT-5 release it was a poor imitation IMO - in the macOS terminal it somehow even disabled copy and paste!
Codex today is almost unrecognizable in comparison to that version. It's really good. I use both it and Claude Code almost interchangeably at the moment and I'm not really feeling that one is notably better than the other.
Between codex and Claude code, I am using both on two entirely different projects, so it's not a fair comparison, but the two things Claude has in my unscientific testing is a better plan mode (though having to use the magic password ultrathink belies needing to learn how to prompt it properly) resulting in a better and longer Todo.md (but again, the two projects between the two are very different, so don't consider it a scientific comparison). The other thing that Claude code has that codex does not is a more advanced ability to run things in the background, both as in "npm run dev"/other webserver in the background and being able to run curl/whatever against it, and gather logs from the webserver, error message or otherwise. The other thing is Claude code can be told to run things "in sub-agents" and work on multiple things simultaneously. If there is a way to get codex to do it, it won't tell me and says it can't do that. Not that I necessarily believe it. My experience with codex is that, despite instructions in agents.md, it'll forget and get its wires crossed. I'm using --model gpt-5-codex, and codex reports version 0.42.0.
My terminal agent is a Go bubbletea app and it too disables copy/paste and I haven't bothered to figure out why. Of course, I am also not the 5th most visited site on the Internet (more's the pity).
Codex with gpt-5-codex (high) is like an outsourced consultant. You give them the specs and a while later you get the output. It doesn't communicate much during the run (especially the VSCode plugin is really quiet).
Then you check the result and see what happened. It's pretty good at one-shotting things if it gets the gist, but if it goes off the rails you can't go back three steps and redirect.
On the other hand Claude Code is more like pair programming, it's chatting about while doing things, telling you what it's doing and why "out loud". It's easier to interrupt it when you see it going off track, it'll just stop and ask for more instructions (unlike Copilot where if you don't want it to rm the database.file you need to be really fast and skip the operation AND hit the stop button below the chatbox).
I use both regularly, GPT is when I know what to do and have it typed out. Claude is for experimenting and dialogue like "what would be a good idea here?" type of stuff.
- significantly less obsequious (very few "you're absolutely right" that Claude vomits out on every interaction)
- less likely to forget and ignore context and AGENTS.md instructions
- fewer random changes claiming "now everything is fixed" in the first 30-50% of context
- better understanding of usage rules (see link below), one-shotting quite a few things Claude misses
Language + framework: Elixir, Phoenix LiveView, Ash, Oban, Reactor
SLoC: 22k lines
AGENTS.md: some simple instructions, pointing to two MCPs (Tidewave and Stripe), requirement to compile before moving onto next file, usage rules https://hexdocs.pm/usage_rules/readme.html
I've found GPT-5-Codex (the model used by default by OpenAI Codex CLI) to be superior but, as others have stated, slower.
Caveat, requires a linux environment, OSX, or WSL.
In general, I find that it will write smarter code, perform smarter refactors, and introduce less chaos into my codebase.
I'm not talking about toy codebases. I use agents on large codebases with dozens of interconnected tools and projects. Claude can be a bit of a nightmare there because it's quite myopic. People rave about it, but I think that's because they're effectively vibe-coding vastly smaller, tight-scoped things like tools and small websites.
On a larger project, you need a model to take the care to see what existing patterns you're using in your code, whether something's already been done, etc. Claude tends to be very fast but generate redundant code or comical code (let's try this function 7 different ways so that one of those ways will pass). This is junior coder bullshit. GPT-5-Codex isn't perfect but there's far far less of that. It takes maybe 5x longer but generates something that I have more confidence in.
I also see Codex using tools more in smart ways. If it's refactoring, it'll often use tools to copy code rather than re-writing the code. Re-writing code is how so many bugs have been introduced by LLMs.
I've not played with Sonnet 4.5 yet so it may have improved things!
Codex-cli doesn't require Linux or WSL. I've been using it on Windows all week. That said, the model seems to get confused by Powershell from time to time, but who doesn't?
You should really try it in WSL or proper Linux, the experience is vastly different. I've mostly been using Codex (non-interactively though) for a long time on Linux. I tried it out on Windows just the other day for the first time and quoting + PowerShell seems to really confuse it. It was borderline unusable for me as it spent most of the reasoning figuring out the right syntax of the tooling, on Linux there is barely anything of that.
You're right. Tried it out this morning, and it uses fewer tokens and gets the job done quicker, not wasting so much time on the Powershell nonsense. I was resisting setting up WSL because this is just a temporary workstation, but it was worth it
I wouldn’t say it’s particularly good, at least not based on my limited experience. While some people argue it’s significantly better, I’ve found it to be noticeably slower than Claude, often taking two to three times longer to complete the same tasks. That said, I’m open to giving it another try when I have more time; right now my schedule is too tight to accommodate its slower pace.
Codex CLI had some huge upgrades in the past few months.
Before the GPT-5 release it was a poor imitation IMO - in the macOS terminal it somehow even disabled copy and paste!
Codex today is almost unrecognizable in comparison to that version. It's really good. I use both it and Claude Code almost interchangeably at the moment and I'm not really feeling that one is notably better than the other.
Between codex and Claude code, I am using both on two entirely different projects, so it's not a fair comparison, but the two things Claude has in my unscientific testing is a better plan mode (though having to use the magic password ultrathink belies needing to learn how to prompt it properly) resulting in a better and longer Todo.md (but again, the two projects between the two are very different, so don't consider it a scientific comparison). The other thing that Claude code has that codex does not is a more advanced ability to run things in the background, both as in "npm run dev"/other webserver in the background and being able to run curl/whatever against it, and gather logs from the webserver, error message or otherwise. The other thing is Claude code can be told to run things "in sub-agents" and work on multiple things simultaneously. If there is a way to get codex to do it, it won't tell me and says it can't do that. Not that I necessarily believe it. My experience with codex is that, despite instructions in agents.md, it'll forget and get its wires crossed. I'm using --model gpt-5-codex, and codex reports version 0.42.0.
My terminal agent is a Go bubbletea app and it too disables copy/paste and I haven't bothered to figure out why. Of course, I am also not the 5th most visited site on the Internet (more's the pity).
One workaround I found as using iTerm2 instead of Terminal.app and then using a weird keyboard combo, I think it was Cmd + Option + mouse drag.
Codex with gpt-5-codex (high) is like an outsourced consultant. You give them the specs and a while later you get the output. It doesn't communicate much during the run (especially the VSCode plugin is really quiet).
Then you check the result and see what happened. It's pretty good at one-shotting things if it gets the gist, but if it goes off the rails you can't go back three steps and redirect.
On the other hand Claude Code is more like pair programming, it's chatting about while doing things, telling you what it's doing and why "out loud". It's easier to interrupt it when you see it going off track, it'll just stop and ask for more instructions (unlike Copilot where if you don't want it to rm the database.file you need to be really fast and skip the operation AND hit the stop button below the chatbox).
I use both regularly, GPT is when I know what to do and have it typed out. Claude is for experimenting and dialogue like "what would be a good idea here?" type of stuff.
I've found it slower than Claude but:
- significantly less obsequious (very few "you're absolutely right" that Claude vomits out on every interaction)
- less likely to forget and ignore context and AGENTS.md instructions
- fewer random changes claiming "now everything is fixed" in the first 30-50% of context
- better understanding of usage rules (see link below), one-shotting quite a few things Claude misses
Language + framework: Elixir, Phoenix LiveView, Ash, Oban, Reactor
SLoC: 22k lines
AGENTS.md: some simple instructions, pointing to two MCPs (Tidewave and Stripe), requirement to compile before moving onto next file, usage rules https://hexdocs.pm/usage_rules/readme.html
I've found GPT-5-Codex (the model used by default by OpenAI Codex CLI) to be superior but, as others have stated, slower.
Caveat, requires a linux environment, OSX, or WSL.
In general, I find that it will write smarter code, perform smarter refactors, and introduce less chaos into my codebase.
I'm not talking about toy codebases. I use agents on large codebases with dozens of interconnected tools and projects. Claude can be a bit of a nightmare there because it's quite myopic. People rave about it, but I think that's because they're effectively vibe-coding vastly smaller, tight-scoped things like tools and small websites.
On a larger project, you need a model to take the care to see what existing patterns you're using in your code, whether something's already been done, etc. Claude tends to be very fast but generate redundant code or comical code (let's try this function 7 different ways so that one of those ways will pass). This is junior coder bullshit. GPT-5-Codex isn't perfect but there's far far less of that. It takes maybe 5x longer but generates something that I have more confidence in.
I also see Codex using tools more in smart ways. If it's refactoring, it'll often use tools to copy code rather than re-writing the code. Re-writing code is how so many bugs have been introduced by LLMs.
I've not played with Sonnet 4.5 yet so it may have improved things!
Codex-cli doesn't require Linux or WSL. I've been using it on Windows all week. That said, the model seems to get confused by Powershell from time to time, but who doesn't?
You should really try it in WSL or proper Linux, the experience is vastly different. I've mostly been using Codex (non-interactively though) for a long time on Linux. I tried it out on Windows just the other day for the first time and quoting + PowerShell seems to really confuse it. It was borderline unusable for me as it spent most of the reasoning figuring out the right syntax of the tooling, on Linux there is barely anything of that.
You're right. Tried it out this morning, and it uses fewer tokens and gets the job done quicker, not wasting so much time on the Powershell nonsense. I was resisting setting up WSL because this is just a temporary workstation, but it was worth it
I wouldn’t say it’s particularly good, at least not based on my limited experience. While some people argue it’s significantly better, I’ve found it to be noticeably slower than Claude, often taking two to three times longer to complete the same tasks. That said, I’m open to giving it another try when I have more time; right now my schedule is too tight to accommodate its slower pace.