> You/we are all susceptible to this sort of thing, and I call BS on anyone who says they check every little thing their agent does with the same level of scrutiny as they would if they were doing it manually.
Why? I do that. I give it broad permissions but I also give it very specific instructions, especially when it's about deleting resources. I work in small chunks and review before committing, and I push before starting another iteration (so that if something goes wrong, I have a good state I can easily restore).
I'm the one with the brain. The LLM can regurgitate a ton of plumbing and save days of sifting through libraries, but it'll still get something wrong because at the core it's still a probabilistic output generator. No matter how good it becomes, it still cannot judge whether it's doing something a human will immediately spot as "stupid".
Interacting with and fixing API calls automatically is something that normally works for me, but allowing the agent to run a terraform destroy is something I'd have never let it execute, I'd have been very specific about that.