That's a great example of how dangerous actions are perceived as innocent. The entire model of approving specific commands is absolutely bonkers.

npm run build = run an arbitrary shell command written in package.json

Meanwhile the agent could have done any of the following without approval:

- edited `package.json` to contain any arbitrary build command

- planted malicious code in `build.js` (called by `npm run build`)

- planted malicious code in `node_modules/xyz/index.js` (imported by `build.js`)

Yup. The most secure computer is one encased in concrete and dropped into the ocean.

Concrete alone isn't enough, you also need to have it be enclosed in a Faraday Cage.

that's a great point, and also the problem with relying on a human-in-the-loop to catch these kind of issues when it can be circumvented even if they were perfect

What would a better system look like?

Agents should make better use of OS sandboxing facilities with finer-grained ACLs.

Less: Do you want to run "npm run build"?

More: "npm run build" tried to read your Chrome cookie database, do you want to allow that?

Some agents like Codex use sandboxing on Linux/MacOS but the permissions are far too coarse - they'll run the command in a relatively strict sandbox and when it fails they'll ask you to allowlist the command as a whole, forever. There should be a new permission prompt every time a command tries to do something new.

Claude suggests (or used to suggest - it's been a while) to allowlist "bash" which completely defeats the point. If you do that the agent can run `bash -c "echo literally anything"`

Not using agents at all. It could edit your code to do something malicious when you run it. Not even once. Not even if the agent has a gun to your head.

[deleted]