> Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.

Coherence requires 2 opposing forces to hold coherence in one dimension and at least 3 of them in higher dimensions of quality.

My team wrote up a paper titled "If You Want Coherence, Orchestrate a Team of Rivals"[1] because we kept finding that upping the reasoning threshold resulted in less coherence - more experimentation before we hit a dead-end to turn around.

So we had a better result from using Haiku (we fail over to Sonnet) over Opus and using a higher reasoning model to decompose tasks rather than perform each one of them.

Once a plan is made, the cheaper models do better as they do not double-think their approaches - they fail or they succeed, they are not as tenacious as the higher cost models.

We can escalate to higher authority and get out of that mess faster if we fail hard and early.

The knowledge of how exactly failure happened seems to be less useful to the higher reasoning model over the action biased models.

Splitting up the tactical and strategic sides of the problem, seems to work similarly to how Generals don't hold guns in a war.

[1] - https://arxiv.org/abs/2601.14351

> Coherence requires 2 opposing forces

This seems very basic to any kind of information processing beyond straight shot predictable transforms.

Expansion and reduction of possibilities, branches, scope, etc.

Biological and artificial neural networks converging into multiple signals, that are reduced by competition between them.

Scientific theorizing, followed by experimental testing.

Evolutionary genetic recombination and mutation, winnowed back by resource competition.

Generation, reduction, repeat.

In a continually coordinated sense too. Many of our systems work best by encouraging simultaneous cooperation and competition.

Control systems command signal proportional to demand, vs. continually reverse-acting error feedback.

> This seems very basic

Yes, this is not some sort of hard-fought wisdom.

It should be common sense, but I still see a lot of experiments which measure the sound of one hand clapping.

In some sense, it is a product of laziness to automate human supervision with more agents, but on the other hand I can't argue with the results.

If you don't really want the experiments and data from the academic paper, we have a white paper which is completely obvious to anyone who's read High Output Management, Mythical Man Month and Philosophy of Software Design recently.

Nothing in there is new, except the field it is applied to has no humans left.

> Yes, this is not some sort of hard-fought wisdom.

By basic I didn't mean uninteresting.

In fact, despite the pervasiveness and obviousness of the control and efficiency benefits of push-pull, generating-reducing, cooperation-competition, etc., I don't think I have ever seen any kind of general treatment or characterization that pulled all these similar dynamics together. Or a hierarchy of such.

> In some sense, it is a product of laziness to automate human supervision with more agents, but on the other hand I can't argue with the results.

I think it is the fact that the agents are operating coherently with the respective complementary goals. Whereas, asking one agent to both solve and judge creates conflicting constraints before a solution has begun.

Creative friction.

I am reminded of brainstorming sessions, where it is so important to note ideas, but not start judging them, since who knows what crazy ideas will fit or spark together. Later they can be selected down.

So we institutionalize this separation/staging with human teams too, even if it is just one of us (within our context limits, over two inference sessions :).

More or less, delegation and peer review.