We keep seeing the same pattern… that agents that can take high-impact actions (publishing, submitting, posting) with no verification layer between “the model decided to” and “it happened.” The fix isn’t post-hoc moderation, it’s action classification at the tool level. Every tool an agent can call should have a risk rating, and high-impact actions should require explicit human confirmation before execution.