I've seen something similar with forecasting.
I can get more accurate forecasts by having distinct agents do different research tasks, summarize, and then feed those summaries to an agent that uses judgment to forecast.
I think it is all about managing context.