Hacker News

The deny list problem is real but I think the harder issue is that context matters so much. Deleting a temp file and deleting a config file look the same to a classifier.

We've been approaching it from the policy side, define what the agent is allowed to do upfront and evaluate each action before it runs. Human approval for anything that falls outside the policy. Different tradeoffs but same underlying frustration.