Here's the threat model I (a luddite) use to evaluate these. The claude code harness can be mostly trusted, the model cannot be trusted because it is exposed to untrusted data from the internet, and there is no separation of data/code in an llm [0][1].
I want to avoid running untrusted code on my local machine, because it could steal secrets, install malware, etc.
Since the model is allowed to write without restriction (I think) to the project directory, anything in the project directory is also untrusted. Running standard commands from the system is fine, as long as you know what those commands are going to do. Running anything from the local directory should be avoided because the code is untrusted.
This is just one security model, there are many others! If a person is running claude in a stronger sandbox, that changes the model considerably. What threat model do you use to evaluate whether an agent's actions are safe?
What the hell is going on in this thread? This isn't good. The "threats" don't make sense. Oh no, all the sensitive information in my package.json...
Here's the threat model I (a luddite) use to evaluate these. The claude code harness can be mostly trusted, the model cannot be trusted because it is exposed to untrusted data from the internet, and there is no separation of data/code in an llm [0][1].
I want to avoid running untrusted code on my local machine, because it could steal secrets, install malware, etc.
Since the model is allowed to write without restriction (I think) to the project directory, anything in the project directory is also untrusted. Running standard commands from the system is fine, as long as you know what those commands are going to do. Running anything from the local directory should be avoided because the code is untrusted.
This is just one security model, there are many others! If a person is running claude in a stronger sandbox, that changes the model considerably. What threat model do you use to evaluate whether an agent's actions are safe?
[0]: https://www.schneier.com/essays/archives/2024/05/llms-data-c... [1]: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
If you think the worst that an agent can do is leak your package.json, your threat model is wayyy broken.