I am a bit confused which part you disagree with specifically. Reading AI responses and reviewing code seems to be what you propose as well.
Your example with MLIP is something that would not be prevented by this approach, during the planing phase, it would surface.
I guess the devil is in the details and the way you prompt it for starting the task matters.
But IMO you absolutely need to check the output, need to engage with what the model is doing, need to probe why something is built the way the model tries to build it.
I disagree with keeping an eye on the model as it is working, approving every command, and denying and stopping the model when you think it has gone wrong. It is not that it is actively harmful to do this, but rather that it is a waste of time and you can avoid the need for it through better design discussions and review.
Micro-managing and keeping the AI on a "short leash" also lends itself better to telling models to do smaller units of work at a time instead of discussing broader design concerns. That is why I think someone doing this would miss the MILP solution, because they might never discuss the overall design with the model but rather just tell it what to implement next.
I personally am somewhere between you and the author. I don't check _all_ the intermediary steps, but I do try to understand what it's doing [1] and follow the process. Mostly I let it do the changes itself without supervision at each step but when a coherent "chunk" of work is done, I go through it really thoroughly. In almost 90% of the cases after a chunk is done some adjustments are needed.
I find broad architectural design to be _better_ if you follow along in the process because you better understand the direction it's going earlier and you can shift the high level direction much earlier. Even if you check its steps, you can ask it for its take on high-level architectural aspects along the way, no problem. I think personal touch matters a lot though, because I naturally ask it and try to get the big picture image.
[1] I actually find it really instructive what tooling it uses to tackle a problem, I got to become a much better console user because of it
I agree. Better to let it rip in a sandbox then spend your time correcting the finished product.
Waste of time being in the middle.