After adding an adversarial review gate to implementation plans and code I saw large uptick in quality. I use Opus 4.8 as plan writer and orchestrator. For adversarial reviewer I use GPT 5.5.

I still find things to tweak and fix up but the amount dropped pretty dramatically. As always I am responsible for what I ship so I review and test everything of course. I still think we are a ways away from fully automated software forge but what is currently possible is pretty cool.