I've seen something similar with forecasting.

I can get more accurate forecasts by having distinct agents do different research tasks, summarize, and then feed those summaries to an agent that uses judgment to forecast.

I think it is all about managing context.