If you haven't reviewed the code yet, how can you say it did 4 weeks of work in 2 days? You haven't verified the correctness, and besides reviewing the code is part of the work.

That's what I was getting at. With the review and potential rework time, we could be looking at over the original 4 week estimate. So then what's the point in using long running unsupervised agents if it ends up being longer than doing it in small chunks.