People's mileage may vary, but in my instance, this was so bad that I actually got angry while trying to use it.
It's slow and stupid. It does not do proper research. It does not follow instructions. It randomly decides to stop being agentic, and instead just dumps the code for me to paste. It has the extremely annoying habit of just doing stuff without understanding what I meant, making a mess, then claiming everything is fine. The outdated training data is extremely annoying when working with Nuxt 4+. It is not creative at solving problems. It dosent show the thinking. The Undo code does not give proper feedback on the diff and if it actually did "undo." And I hate the personality. It HAS to be better than it comes off for me because I am actually in a bad mood after having worked with it. I would rather YOLO code with Gemini 3 flash, since it's actually smarter in my assessment, and at least I can iterate faster, and it feels like it has better common sense.
Just as an example, I found an old, terrible app I made years ago for our firm that handles room reservations. I told it to update from Bootstrap to Flowbite UI. Codex just took forever to make a mess, installed version 2.7 when 4.0.1 is the latest, even when I explicitly stated that it should use the absolute latest version. Then it tried to install it and failed, so it reverted to the outdated CDN.
I gave the same task to Claude Code. Same prompt. It one-shotted it quickly. Then I asked it to swap out ALL the fetch logic to have SPA-like functionality with the new beta 4 version of HTMX, and it one-shot that too in the time Codex spent just trying to read a few files in the project.
This reminds me of the feeling I had when I got the Nokia N800. It was so promising on paper, but the product was so bad and terrible to use that I knew Nokia was done for. If this was their take on what an acceptable smartphone could be, it proves that the whole foundation is doomed. If this is OpenAI's take on what an agentic coding assistant should be—something that can run by itself and iterate until it completes its task in an intelligent and creative way.... OpenAI is doomed.
If you're using 5.2 high, with all due respect, this has to be a skill issue. If you're using 5.2 Codex high — use 5.2 high. gpt-5.2 is slow, yes (ok, keeping it real, it's excruciatingly slow). But it's not the moronic caricature you're saying it is.
If you need it to be up to date with your version of a framework, then ask it to use the context7 mcp server. Expecting training data to be up to date is unreasonable for any LLM and we now have useful solutions to the training data issue.
If you need it to specify the latest version, don't say "latest". That word would be interpreted differently by humans as well.
Claude is well known at its one-shotting skills. But that's at the expense of strict instruction following adherence and thinner context (it doesn't spend as much time to gather context in larger codebases).
I am using GPT-5.2 Codex with reasoning set to high via OpenCode and Codex and when I ask it to fix an E2E test it tells me that it fixed it and prints a command I can run to test the changes, instead of checking whether it fixed the test and looping until it did. This is just one example of how lazy/stupid the model is. It _is_ a skill issue, on the model's part.
Non codex gpt 5.2 is much better than codex gpt 5.2 for me. It does everything better.
Yup, I find it very counter-intuitive that this would be the case, but I switched today and I can already see a massive difference.
It fits with the intuition that codex is simply overfitted.
Yeah I meant it more like it is not intuitive to my why OpenAI would fumble it this hard. They have got to have tested it internally and seen that it sucked, especially compared to GPT-5.2
Codex runs in a stupidly tight sandbox and because of that it refuses to run anything.
But using the same model through pi, for example, it's super smart because pi just doesn't have ANY safeguards :D
I'll take this as my sign to give Pi a shot then :D Edit: I don't want to speak too son, but this Pi thing is really growing on me so far… Thank you!
Wait until you figure out you can just say "create a skill to do..." and it'll just do it, write it in the right place and tell you to /reload
Or "create an extension to..." and it'll write the whole-ass extension and install it :D
i refuse to defend the 5.2-codex models. They are awful.
Perhaps if he was able to get Claude Code to do what he wanted in less time, and with a better experience, then maybe that's not a skill he (or the rest of us) want to develop.
Talking LLMs off a ledge is a skill we will all need going forward.
still a skill issue, not a codex issue. sure, this line of critique is also one levied by tech bros who want to transfer your company's balance sheet from salaries to ai-SaaS(-ery), but in what world does that automatically make the tech fraudulent or even deficient? and since when is not wanting to develop a skill a reasonable substitute for anything? if my doctor decided they didn't want to keep up on medical advances, i would find a different doctor. but yet somehow finding fault with an ai because it can't read your mind and, in response to that adversity, refusing to introspect at all about why that might be and blaming it on the technology is a reasonable critique? somehow we have magically discovered a technology to manufacture cognition from nothing more than the intricate weaving of silicon, dopants, et al., and the takeaway is that it sucks because it is too slow, doesn't get everything exactly right, etc.? and the craziest part is that the more time you spend with it, the better intuition you get for getting whatever it is you want out of it. but, yeah... let's lend even more of an ear to the head-in-sand crowd-- that's where the real thought leaders are. you don't have to be an ai techno-utopian maximalist to see the profound worthiness and promise of the technology; these things are manifestly self-evident.
Sure, that's fine. I wrote my comment for the people who don't get angry at an AI agents after using them for the first time within five hours of their release. For those who aren't interested in portending doom for OpenAI. (I have elaborate setups for Codex/Claude btw, there's no fanboying in this space.)
Some things aren't common sense yet so I'm trying my part to make them so.
common sense has the misfortune of being less "common" than we would all like it to be. because some breathless hucksters are overpromising and underdelivering in the present, we may as well throw out the baby, the bath water, and the bath tub itself! who even wants computers to think like humans and automate jobs that no human would want to do? don't you appreciate the self-worth that comes from menial labor? i don't even get why we use tractors to farm when we have perfectly good beasts of burden to do the same labor!
Feelings are information with just as much, or more, value as biased intellectualizing.
Ask Linus Torvalds.
i have absolutely no idea whatsoever what this means
TBH, "use a package manager, don't specify versions manually unless necessary, don't edit package files manually" is an instructions that most agents still need to be given explicitly. They love manually editing package.json / cargo.toml / pyproject.toml / what have you, and using whatever version is given in their training data. They still don't have an intuition for which files should be manually written and which files should be generated by a command.
Agree, especially if they're not given access to the web, or if they're not strongly prompted to use the web to gather context. It's tough to judge models and harnesses by pure feel until you understand their proclivities.
Ty for the tip on context7 mcp btw
How would a person interpret the latest version of flowbite?
Ok. You do you. I'll stick with the models that understand what latest version of a framework means.
Agreed, had the same experience. Codex feels lazy - I have to explicitly tell it to research existing code before it stops giving hand-wavy answers. Doc lookup is particularly bad; I even gave it access to a Context7 MCP server for documentation and it barely made a difference. The personality also feels off-putting, even after tweaking the experimental flag settings to make it friendlier.
For people suggesting it’s a skill issue: I’ve been using Claude Code for the past 6 months and I genuinely want to make Codex work - it was highly recommended by peers and friends. I’ve tried different model settings, explicitly instructed it to plan first and only execute after my approval, tested it on both Python and TypeScript backend codebases. Results are consistently underwhelming compared to Claude Code.
Claude Code just works for me out of the box. My default workflow is plan mode - a few iterations to nail the approach, then Claude one-shots the implementation after I approve. Haven’t been able to replicate anything close to that with Codex
+1 to this. Been using Codex the last few months, and this morning I asked it to plan a change. It gave me generic instructions like 'Check if you're using X' or 'Determine if logic is doing Y' - I was like WTF.
Curious, are you doing the same planning with Codex out-of-band or otherwise? In order to have the same measurable outcome you'd need to perhaps use Codex in a plan state (there's experimental settings - not recommended) or other means (explicit detailed -reusable- prompt for planning a change). It's a missing feature if your preference is planning in CLI (I do not prefer this).
You are correct in that this mode isn't "out of the box" as it is with Claude (but I don't use it in Claude either).
My preference is to have smart models generate a plan with provided source. I wrote (with AI) a simple python tool that'll filter a codebase and let me select all files or just a subset. I then attach that as context and have a smart model with large context (usually Opus, GPT-5.2, and Gemini 3 Pro in parallel), give me their version of a plan. I then take the best parts of each plan, slap it into a single markdown and have Codex execute in a phased manner. I usually specify that the plan should be phased.
I prefer out-of-CLI planning because frankly it doesn't matter how good Codex or Claude Code dive in, they always miss something unless they read every single file and config. And if they do that, they tip over. Doing it out of band with specialized tools, I can ensure they give me a high quality plan that aligns with the code and expectations, in a single shot (much faster).
Then Claude/Codex/Gemini implement the phased plan - either all at once - or stepwise with me testing the app at each stage.
But yeah, it's not a skill issue on your part if you're used to Plan -> Implement within Claude Code. The Experimental /collab feature does this but it's not supported and more experimental than even the experimental settings.
I'm not taking OpenAI's side here but have you reviewed what claude did?
I only use claude through the chat ui because it’s faster and it gives me more control. I read most of it and the code is almost always better than what I would do, simply because lazy ass me likes to take shortcuts way too often.
I just want Anthropic to spend like two weeks making their own "Codex app", but with Opus.