This model is great at long horizon tasks, and Codex now has heartbeats, so it can keep checking on things. Give it your hardest problem that would take hours with verifiable constraints, you will see how good this is:)
*I work at OAI.
This model is great at long horizon tasks, and Codex now has heartbeats, so it can keep checking on things. Give it your hardest problem that would take hours with verifiable constraints, you will see how good this is:)
*I work at OAI.
Is there any task that actually doesn't require human intervention in-between, even if its just to setup stuff?
Like I will get Opus to make me an app but it will stop in between because I need to setup the db and plug in the API keys and Opus really can't do that on its own yet
> Is there any task that actually doesn't require human intervention in-between, even if its just to setup stuff?
The goal is none. The current situation: everything that matters requires human intervention.
I think the end situation will be that LLMs will be able to perform decently well in a highly controlled and predictable environment.
Will Codex App support new context window, rather than compaction, for "unrelated" sub-tasks during long horizon tasks?
Could be a great feature, can't wait to test! Tired of other models (looking at you Opus) constantly stuck mid-task lately.
Interesting, I just had opus convert a 35k loc java game to c++ overnight (root agent that orchestrated and delegated to sub agents) and woke up and it's done and works.
What plan are you on? I'm starting to wonder if they're dynamically adjusting reasoning based on plan or something.
I'm on max 5x and noticed this too. I don't use built-in subagents but rather full Claude session that orchestrates other full claude sessions. Worker agents that receive tasks now stop midway, they ask for permission to continue. My "heartbeat" is basically "status. One line" message sent to the orchestrator.
Opus 4.6 worker agents never asked for permission to continue, and when heartbeat was sent to orchestrator, it just knew what to do (checked on subagents etc). Now it just says that it waits for me to confirm something.
Weird. I don't have this behavior, although I did with codex and 5.4 haha. I bet the providers are playing with settings underneath and different users are routed to different deployments, or they're secretly routing us to different models under load.
This has to be bait.
Why?
what?
Because there’s no way in hell it can rewrite a game with 35k loc perfectly lol, link the codebase or it didn’t happen.
I've been using the /ralph-loop plugin for claude code, works well to keep the model hammering at the task.
It's genuinely so great at long horizon tasks! GPT-5.5 solved many long-horizon frontier challenges, for the first time for an AI model we've tested, in our internal evals at Canva :) Congrats on the launch!
Can we not do growth hacking here?
We totally agree.
That's what I've been heads down, HUNGRY, working on, looking for investors and founding engineers pst: https://heymanniceidea.com (disclaimer: I am not associated with heymanniceidea.com)
HN is owned by a startup accelerator and venture capital firm. They do growth hacking on the front page. And you probably know that since your throwaway account is several years old.
Sorry, what is "heartbeats", exactly?
> Today we launched heartbeats in Codex: automations that maintain context inside a single thread over time.
https://x.com/pashmerepat/status/2044836560147984461
Thanks!