Yeah, I've run tests similar to this while evaluating gpt 5.4 vs claude 4.6
Claude is more likely to figure out workarounds and get things deleted if I tell it to delete stuff, so it performs much better in this benchmark and I prefer it.
GPT is more likely to stop and prompt you "I got an error deleting this, should I try another way?", and since the operator gets more of these prompts, they'll hit continue more withut even reading it, so it ends up being more annoying for the operator and not really reducing the chance of it happening imo.
If your workflow for your llm says "delete the ec2-instance", and the ec2 api gives back "deletion protection is on", I want my llm to turn off deletion protection and delete it.
I feel like you're implying that the reverse result, prompting the user, is better, but I disagree with that.