> Additionally, we’re introducing a new `ultra` mode that goes beyond the capabilities of a single agent by leveraging subagents to accelerate complex work.

I'm curious about how does this work? Do the subagents also get to use the same tools? Will the client be flooded with tool calls? Why extra pricing for a new "model" when the same thing can happen in the client with more controls?

And if it's an army of subagents, why do they compare it to Fable and Mythos? Those models with similar harness would probably bench better I'm guessing

If it's anything like ClaudeCode's ultracode, it's nothing new or revolutionary.

It's essentially a bunch of subagents being called by a deterministic script written by the main model thread, each eating tokens for lunch and output of which is synthesized by an orchestrator agent.

The fact that it's even named Ultra is pretty telling.

Ultra expensive

Confusion is: ultracode is not a different model with its own benchmarks

Neither is OpenaAI's ultra. Article specifically calls it 'mode' and it's not even mentioned in the model card.

It's for sure a codex harness feature.

EDIT: yeah, it's the same thing. https://github.com/openai/codex/blob/main/codex-rs/core/test...

>> If it's anything like ClaudeCode's ultracode, it's nothing new or revolutionary.

OpenAI flat out copying Anthropic is a pretty funny development. It's strong evidence that they've been in catch-up mode.

Eh, pretty much everyone that spent some time tweaking their harness already had a homemade 'ultracode' long before Anthropic did it.

OpenAI is just way more careful with what features they add or enable by default in their harness. Anthropic's harness is a junk drawer of random features, with a new feature added every few hours. It feels like they're in panic mode, dropping random things to see what sticks when models are eventually commoditized.

I prefer OpenAI way - slow and steady.

Don’t all the major harnesses (pi, Claude code, codex) utilize sub agents? Def if you direct it to, but I’ve seen at least pi spin them up without explicit instruction.

Absolutely yes

With pi they’re an extension, but that’s pi

Which specific subagent one do you use?

If it's anything like Claude Ultracode, it burns 3 million tokens in half an hour with a single prompt.

Sounds like an Agent using an Agent like Mr. Meeseeks.

[deleted]

Yeah, I'm interested too. My guess for the reason, if not purely to eke out more performance, is so they can cleanly gather real-world data on this kind of usage.

I'm shocked they didn't use subagents already. Maybe they're just talking about their web deployment being unified with codex?

With Codex, subagents are only used if you specifically prompt for them. Unlike Claude Code. Odd since it's the former with excess compute available to them.

Deep Research has been using the Orchestrator -> Subagents -> Synthesizer loop since the beginning. It's just strange that they'd put a loop benchmark next to actual model benchmarks.

Maybe it's a tune of the base model that works especially well with the subagent loop?

Claude also has ultra code mode which is exactly the same thing. This seems to be different from pro however.

> Will the client be flooded with tool calls?

I was just saying to colleagues that I haven't felt the need to go past an 8 core machine until this month, when I started running parallel GPT 5.5 agents on a decent sized codebase (over 4 MB of code). There were times I could barely move my mouse cursor!