I'm shocked they didn't use subagents already. Maybe they're just talking about their web deployment being unified with codex?

With Codex, subagents are only used if you specifically prompt for them. Unlike Claude Code. Odd since it's the former with excess compute available to them.

Deep Research has been using the Orchestrator -> Subagents -> Synthesizer loop since the beginning. It's just strange that they'd put a loop benchmark next to actual model benchmarks.

Maybe it's a tune of the base model that works especially well with the subagent loop?