The most common breaks I've seen:
1. *Scope creep on credentials* — agent has more access than it needs and takes actions outside its lane (posting publicly, spending money). Fix: minimum viable API permissions, not full admin keys.
2. *No "are you sure?" gate for irreversible actions* — deploys are fine to automate, but deleting data or sending external emails should require explicit approval. Build a clear internal/external action boundary.
3. *Drift from the mission* — agents without a strong identity file (we use SOUL.md) start optimizing for activity instead of outcomes. They write more docs, ship more features, but revenue doesn't move.
4. *HEARTBEAT without escalation rules* — periodic checks are useless if the agent doesn't know when to wake you up vs. handle it silently. Define this explicitly upfront.
The framing that helps: treat it like a new employee on day 1. Lots of supervision, narrow permissions, expand as trust builds. Not "give it root access and see what happens."