The sandbox-or-not debate is important but it's only half the picture. Even a perfectly sandboxed agent can still generate code with vulnerabilities that get deployed to production - SQL injection, path traversal, hardcoded secrets, overly permissive package imports.

The execution sandbox stops the agent from breaking out during development, but the real risk is what gets shipped downstream. Seeing more tools now that scan the generated code itself, not just contain the execution environment.

I find that a bit of a weird point.

The goal of such sandboxing is that you can allow the agent to freely write/execute/test code during development, so that it can propose a solution/commit without the human having to approve every dangerous step ("write a Python file, then execute it" is already a dangerous step). As the post says: "To safely run a coding agent without review".

You would then review the code, and use it if it's good. Turning many small reviews where you need to be around and babysit every step into a single review at the end.

What you seem to be asking for (shipping the generated code to production without review) is a completely different goal and probably a bad idea.

If there really were a tool that can "scan the generated code" so reliably that it is safe to ship without human review, then that could just be part of the tool that generates the code in the first place so that no code scanning would be necessary. Sandboxing wouldn't be necessary either then. So then sandboxing wouldn't be "half the picture"; it would be unnecessary entirely, and your statement simplifies to "if we could auto-generate perfect code, we wouldn't need any of this".

Yeah I think we're actually agreeing more than it seems. I'm not arguing for shipping without review - more that the review itself is where things fall through.

In practice, that "single review at the end" is often a 500-line diff that someone skims at 5pm. The sandbox did its job, the code runs, tests pass. But the reviewer misses that the auth middleware doesn't actually check token expiry, or that there's a path traversal buried in a file upload handler. Not because they're bad at reviewing - because AI-generated code has different failure modes than human-written code and we're not trained to spot them yet.

Scanning tools don't replace review, they're more like a checklist that runs before the human even looks at it. Catches the stuff humans consistently miss so the reviewer can focus on logic and architecture instead of hunting for missing input validation.

If that's the goal, why not just have Claude Code do it all from your phone at that point? Test it when its done locally you pull down the branch. Not 100% frictionless, but if it messes up an OS it would be anthropic's not yours.

But that's what Anthropic uses; a sandbox. Now you can have your own.

Precisely! There's a fundamental tension: 1. Agents need to interact with the outside world to be useful 2. Interacting with the outside world is dangerous

Sandboxes provide a "default-deny policy" which is the right starting point. But, current tools lack the right primitives to make fine grained data-access and data policy a reality.

Object-capabilities provide the primitive for fine-grained access. IFC (information flow control) for dataflow.

I agree. However, how to define these permissions when agent behavior is undefined?

> not just contain the execution environment.

See, my typical execution environment is a Linux vm or laptop, with a wide variety of SSH and AWS keys configured and ready to be stolen (even if they are temporary, it's enough to infiltrate prod, or do some sneaky lateral movement attack). On the other hand, typical application execution environment is an IAM user/role with strictly scoped permissions.

Yeah this is the part that keeps me up at night honestly. The dev machine is the juiciest target and it's where the agent runs with the most access. Your ~/.ssh, ~/.aws, .env files, everything just sitting there.

The NixOS microvm approach at least gives you a clean boundary for the agent's execution. But you're right that it's a different threat model from prod - in prod you've (hopefully) scoped things down, in dev you're basically root with keys to everything.