Hacker News

This is cool for testing the model side, but the real scary part is what happens after the injection succeeds. Even if your agent fails 3 out of 10 tests, that's a 30% chance it exfiltrates whatever secrets are in its environment. The defense can't just be "hope the model catches it." You need architectural controls on the egress side too.