Partially agree, but I think "skill issue" undersells the genuine reliability problem the original post is describing.

The skill part is real — giving the agent the right context, breaking tasks into the right size, knowing when to intervene. Most people aren't doing that well and their results reflect it.

But the latent bug problem isn't really a skill issue. It's a property of how these systems work: the agent optimises for making the current test pass, not for building something that stays correct as requirements change. Round 1 decisions get baked in as assumptions that round 3 never questions — and no amount of better prompting fixes that.

The fix isn't better prompting. It's treating agent-generated code with the same scepticism you'd apply to code from a contractor who won't be around to maintain it — more tests, explicit invariants, and not letting the agent touch the architecture without a human reviewing the design first.