Genuinely excited to try this out. I've started using Codex much more heavily in the past two months and honestly, it's been shockingly good. Not perfect mind you, but it keeps impressing me with what it's able to "get". It often gets stuff wrong, and at times runs with faulty assumptions, but overall it's no worse than having average L3-L4 engs at your disposal.
That being said, the app is stuck at the launch screen, with "Loading projects..." taking forever...
Edit: A lot of links to documentation aren't working yet. E.g.: https://developers.openai.com/codex/guides/environments. My current setup involves having a bunch of different environments in their own VMs using Tart and using VS Code Remote for each of them. I'm not married to that setup, but I'm curious how it handles multiple environments.
Edit 2: Link is working now. Looks like I might have to tweak my setup to have port offsets instead of running VMs.
I have the $20 a month subscription for ChatGPT and the $200/year subscription to Claude (company reimbursed).
I have yet to hit usage limits with Codex. I continuously reach it with Claude. I use them both the same way - hands on the wheel and very interactive, small changes and tell them both to update a file to keep up with what’s done and what to do as I test.
Codex gets caught in a loop more often trying to fix an issue. I tell it to summarize the issue, what it’s tried and then I throw Claude at it.
Claude can usually fix it. Once it is fixed, I tell Claude to note in the same file and then go back to Codex
The trick to reach the usage limit is to run many agents in parallel. Not that it’s an explicit goal of mine but I keep thinking of this blog post [0] and then try to get Codex to do as much for me as possible in parallel
[0]: http://theoryofconstraints.blogspot.com/2007/06/toc-stories-...
Telling a bunch of agents to do stuff is like treating it as a senior developer who you trust to take an ambiguous business requirement and letting them use their best judgment and them asking you if they have a question .
But doing that with AI feels like hiring an outsourcing firm for a project and they come back with an unmaintable mess that’s hard to reason through 5 weeks later.
I very much micro manage my AI agents and test and validate its output. I treat it like a mid level ticket taker code monkey.
My experience with good outsourcing firms is that they come back with heavily-documented solutions that are 95% of what you actually wanted, leaving you uncomfortably wondering if doing it yourself woulda been better.
I’m not fully sure what’s worse, something close to garbage with a short shelf life anyone can see, or something so close to usable that it can fully bite me in the ass…
I fully believe that if I didn’t review its output and ask it to clean it up it would become unmaintainable real quick. The trick I’ve found though is to be detailed enough in the design from both a technical and non-technical level, sometimes iterating a few time on it with the agent before telling it to go for it (which can easily take 30 minutes)
That’s how I used to deal with L4, except codex codes much faster (but sometimes in the wrong direction)
It’s funny over the years I went from
1. I like being hands on keyboard and picking up a slice of work I can do by myself with a clean interface that others can use - a ticket taking code monkey.
2. I like being a team lead /architect where my vision can be larger than what I can do in 40 hours a week even if I hate the communication and coordination overhead of dealing with two or three other people
3. I love being able to do large projects by myself including dealing with the customer where the AI can do the grunt work I use to have to depend on ticket taking code monkeys to do.
Moral of the story: if you are a ticket taking “I codez real gud” developer - you are going to be screwed no matter how many b trees you can reverse on the whiteboard
Moral of your story.
Each and everyone of us is able to write their own story, and come up with their own 'Moral'.
Settling for less (if AI is a productivity booster, which is debatable) doesn't equal being screwed. There is wisdom in reaching your 'enough' point.
If you look at the current hiring trends and how much longer it is taking developers to get jobs these days, a mid level ticket taker is definitely screwed between a flooded market, layoffs and AI.
By definition, this is the worse AI coding will ever be and it’s pretty good now.
> By definition, this is the worse AI coding will ever be
This may be true, but it's not necessarily true, and certainly not by definition. For example, formal verification by deductive methods has improved over the past four decades, and yet, by the most important measures, it's got worse. That's because the size of software it can be used to verify affordably has grown, but significantly slower than the growth in the size of the average software project. I.e. it can be used on a smaller portion of software than it could be used on decades ago.
Perhaps ironically, some people believe that the solution to this problem is AI coding agents that will write correctness proofs, but that is based on the hope that their fate will be different, i.e. that their improvement will outpace the growth in software size.
Indeed, it's possible that AI coding will make some kinds of software so cheap that their value will drop to close to zero, and the primary software creation activity by professionals will shift precisely to those programs that agents can't (yet) write.
In the past these trends were cyclical though. We're coming from an expansion phase (mainly driven by the COVID IT and AI craze) and now going through stagnation towards recession (global manufacturing crisis pulling our service sector down with it). This mirrors the hiring trends (or demand for workers). I'm not sure why you wouldn't expect the pendulum to swing back at some point.
I have been in this industry for a long time since 1996.
The 2000 dot com bust wasn’t because all of the ideas were bad most weren’t. They were too soon and before high speed internet was ubiquitous at home let alone in everyone’s pocket.
Incidentally, back then I was a regular old Windows enterprise developer in Atlanta and there were plenty of jobs available at boring companies.
In 2008 was a general shit show for everone. But for tech, the what we now know as the BigTech companies were hiring like crazy and growing old crazy. Just based on the law of large numbers, they aren’t going to grow over the next decade like they grew over the last decade.
They have proven that they can keep going and keep dominating with less people. AI is already started automating the jobs of mid level ticket takers and it’s only going go get worse. Just like factory jobs aren’t coming back.
I am really not convinced yet.
From all the data I have seen, the software industry is poised for a lot more growth in the foreseeable future.
I wonder if we are experiencing a local minima, on a longer upward trend.
Those that do find a job in a few days aren't online to write about it, so based on what is online we are lead to believe that it's all doom and gloom.
We also come out of a silly growth period where anyone who could sort a list and build a button in React would get hired.
My point is not that AI-coding is to be avoided at all costs, it's more about taming the fear-mongering of "you must use AI or will fall behind". I believe it's unfounded - use it as much or as little as you feel the need to.
P.S.: I do think that for juniors it's currently harder and require intentional efforts to land that first job - but that is the case in many other industries. It's not impossible, but it won't come on a silver plate like it did 5-7 years ago.
I mean it is online that major tech companies are have laid off a couple of hundred thousand people. What companies are going to absorb all of these people?
Anyone who hires can tell you one open req gets hundreds of applicants within 24 hours. LinkedIn easy apply backs that up.
I have two anecdotes from both sides. I applied for 200 jobs for a bog standard “C#/Python/Typescript” enterprise developer who had AWS experience. I heard crickets and every application had hundreds of applicants - LinkedIn shows you.
Did I mention according to my resume (I only went back 10 years) I had 10 years of experience as a developer including 2.5 leading AWS architecture at a startup and 3.5 actually working at AWS (ProServe)?
I had 8 jobs since 1996 and I’ve always been able to throw my resume up in the air and by the time it landed I would have three offers. LinkedIn showed that my application had hardly been viewed and my resume only downloaded twice.
Well everything I said above is true. But it was really just an experiment while I was waiting for my plan A outreach to work - targeting companies in a niche in AWS where at the time I could reasonably one of the industry experts with major open source contributions to a popular official “AWS Solution” and leaning on my network of directors, CTOs etc that I had established over the years.
None of them were looking for “human LLM code monkeys” that are a dime a dozen.
On the other hand, I’m in the hiring loop at my company. Last year we had over 6000 applicants and a 4% offer rate.
Who is going to absorb or need a bunch of mid level ticket takers in the future with AI improving? Or at least enough to absorb all of the ones who are currently being laid off and the ones coming in?
I will say that doing small modifications or asking a bunch of stuff fills the context the same in my observations. It depends on your codebase and the rest of stuff you use (sub agents, skills, etc)
I was once minimising the changes and trying to take the max of it. I did an uncountable numbers of tests and and variations. Didn't really matter much if I told it to do it all or change one line. I feel Claude code tries to fill the context as fast as possible anyway
I am not sure how worth Claude is right now. I still prefer that rather than codex, but I am starting to feel that's just a bias
I don’t think it’s bias: I have no love for any of these tools, but in every evaluation we’ve done at work, Opus 4.5 continually comes out ahead in real world performance
Codex and Gemini are both good, but slower and less “smart” when it comes to our code base
I have a found Codex to be an exceptional code-reviewer of Claude's work.
I hit the Claude limit within an hour.
Most of my tokens are used arguing with the hallucinations.
I’ve given up on it.
Do you use Claude Code, or do you use the models from some other tool?
I find it quite hard to hit the limits with Claude Code, but I have several colleagues complaining a lot about hitting limits and they use Cursor. Recently they also seem to be dealing with poor results (context rot?) a lot, which I haven't really encountered yet.
I wonder if Claude Code is doing something smart/special
In my case I've had it (Opus Thinking in CC) hit 80% of the 5-hour limit and 100% of the context window with one single tricky prompt, only to end up with worthless output.
Codex at least 'knows' to give up in half the time and 1/10th of the limits when that happens.
I don't want to be That Guy, but if you're "arguing with hallucinations" with an AI Agent in 2026 you're either holding it wrong or you're working on something highly nonstandard.
Your goal should be to run agents all the time, all in parallel. If you’re not hitting limits, you’re massively underutilizing the VC intelligence subsidy
https://hyperengineering.bottlenecklabs.com/p/the-infinite-m...
Hey thank you for calling out the broken link. That should be fixed now. Will make sure to track down the other broken links. We'll track down why loading is taking a while for you. Should definitely be snappier.
Is this the only announcement for Apple platform devs?
I thought Codex team tweeted about something coming for Xcode users - but maybe it just meant devs who are Apple users, not devs working on Apple platform apps...
Same here. From my experience, codex usually knocks backend/highly "logical?" tasks out of the park while fairly basic front-end/UI tasks it stumbles over at times.
But overall it does seem to be consistently improving. Looking to see how this makes it easier to work with.
Backend, regardless of language or framework are often set in stone. There's a well defined/most used way for everything. Especially since most apps when reduced is CRUD. Frontend by the nature of how frontend works, can be completely different from project to project if one wants to architect it efficiently.
Cool, looks like I'll stay on Cursor. All alternatives come out buggy, they care a lot about developer experience.
BTW OpenAI should think a bit about polishing their main apps instead of trying to come out with new ones while the originals are still buggy.
(I work on Codex) One detail you might appreciate is that we built the app with a ton of code sharing with the CLI (as core agent harness) and the VSCode extension (UI layer), so that as we improve any of those, we polish them all.
Any chance you'll enable remote development on a self-hosted machine with this app?
Ie. I think the codex webapp on a self-hosted machine would be great. This is impotant when you need a beefier machine (with potentially a GPU).
Not going to solve your exact problem but I started this project with this approach in mind
https://github.com/jgbrwn/vibebin
This should be table stakes by now. That's the beauty of these cli tools and how they scale so well.
What are the benefits of using the codex webapp?
Working remotely with the app would truly be great
Interested in this as well.
Any reason to switch from vscode with codex to this app? To me it looks like this app is more for non-developers but maybe I’m missing something
Good question! VS Code is still a great place for deep, hands-on coding with the Codex IDE extension.
We built the Codex app to make it easier to run and supervise multiple agents across projects, let longer-running tasks execute in parallel, and keep a higher-level view of what’s happening. Would love to hear your feedback!
I already have multiple projects that I manage in full-screen via vscode. I just move from one to the other using “cmd” + “->” . You should be aware that the Claude Code extension for vscode is way better than codex extension so perhaps you should work a bit on that as well. Even if the agents do 80% of work I still need to check what they do and a familiar IDE seems the first choice of existing/old school developer
ok , 'projects' but this would make a lot more sense if we could connect remotely to the projects which works without a problem using the IDE plugin, so right now I don't see any advantage of using this
Awesome. Any chance we will see a phone app?
I know coding on a phone sounds stupid, but with an agent it’s mostly approvals and small comments.
The ChatGPT app on iOS has a Codex page, though it only seems to be for the "cloud" version.