I can't shake of the feeling that Googles Deep Think Models are not really different models but just the old ones being run with higher number of parallel subagents, something you can do by yourself with their base model and opencode.

They could do it this way: generate 10 reasoning traces and then every N tokens they prune the 9 that have the lowest likelihood, and continue from the highest likelihood trace.

This is a form of task-agnostic test time search that is more general than multi agent parallel prompt harnesses.

10 traces makes sense because ChatGPT 5.2 Pro is 10x more expensive per token.

That's something you can't replicate without access to the network output pre token sampling.

And after i do that, how do i combine the output of 1000 subagents into one output? (Im not being snarky here, i think it's a nontrivial problem)

You just pipe it to another agent to do the reduce step (i.e. fan-in) of the mapreduce (fan-out)

It's agents all the way down.

The idea is that each subagent is focused on a specific part of the problem and can use its entire context window for a more focused subtask than the overall one. So ideally the results arent conflicting, they are complimentary. And you just have a system that merges them.. likely another agent.

Claude Cowork does this by default and you can see how exactly it is coordinating them etc.

Start with 1024 and use half the number of agents each turn to distill the final result.