That's a great example of how dangerous actions are perceived as innocent. The entire model of approving specific commands is absolutely bonkers.
npm run build = run an arbitrary shell command written in package.json
Meanwhile the agent could have done any of the following without approval:
- edited `package.json` to contain any arbitrary build command
- planted malicious code in `build.js` (called by `npm run build`)
- planted malicious code in `node_modules/xyz/index.js` (imported by `build.js`)
Yup. The most secure computer is one encased in concrete and dropped into the ocean.
Concrete alone isn't enough, you also need to have it be enclosed in a Faraday Cage.
that's a great point, and also the problem with relying on a human-in-the-loop to catch these kind of issues when it can be circumvented even if they were perfect
What would a better system look like?
Agents should make better use of OS sandboxing facilities with finer-grained ACLs.
Less: Do you want to run "npm run build"?
More: "npm run build" tried to read your Chrome cookie database, do you want to allow that?
Some agents like Codex use sandboxing on Linux/MacOS but the permissions are far too coarse - they'll run the command in a relatively strict sandbox and when it fails they'll ask you to allowlist the command as a whole, forever. There should be a new permission prompt every time a command tries to do something new.
Claude suggests (or used to suggest - it's been a while) to allowlist "bash" which completely defeats the point. If you do that the agent can run `bash -c "echo literally anything"`
Not using agents at all. It could edit your code to do something malicious when you run it. Not even once. Not even if the agent has a gun to your head.