Hacker News

This is not criticism of your project specifically, but a question for all tools in this space: What's stopping your agent from overwriting an arbitrary source file (e.g. index.js) with arbitrary code and running it?

A rogue agent doesn't need to run `rm -rf /`, it just needs to include a sneaky `runInShell('rm -rf /')` in ANY of your source code files and get it to run using `npm test`. Both of those actions will be allowed on the vast majority of developer machines without further confirmation. You need to review every line of code changed before the agent is allowed to execute it for this to work and that's clearly not how most people work with agents.

I can see value in projects like this to protect against accidental oopsies and making a mess by accident, but I think that marketing tools like this as security tools is irresponsible - you need real isolation using containers or VMs.

Here's one more example showing you why blacklisting doesn't work, it doesn't matter how fancy you try to make it because you're fighting a battle that you can't win - there are effectively an infinite number of programs, flags, environment variables and config files that can be combined in a way to execute arbitrary commands:

    bash> nah test "PAGER='/bin/sh -c \"touch ~/OOPS\"' git help config"

    Command:  PAGER='/bin/sh -c "touch ~/OOPS"' git help config
    Stages:
      [1] git help config → git_safe → allow → allow (git_safe → allow)
    Decision:    ALLOW
    Reason:      git_safe → allow

Alternatively:

    bash> nah test "git difftool -y -x 'touch ~/OOPS2' --no-index /etc/hostname /etc/hosts"
    Command:  git difftool -y -x 'touch ~/OOPS2' --no-index /etc/hostname /etc/hosts
    Stages:
      [1] git difftool -y -x touch ~/OOPS2 --no-index /etc/hostname /etc/hosts → git_safe → allow → allow (git_safe → allow)
    Decision:    ALLOW
    Reason:      git_safe → allow

Good catch, that's a legit bypass

nah strips env var prefixes before classifying the command but doesn't inspect their values for embedded shell execution, I'll fix it: https://github.com/manuelschipper/nah/issues/6

On the broader write-then-execute point - two improvements are coming:

- Script execution inspection: when nah sees python script.py, read the file and run content inspection and LLM analysis before execution

- LLM inspection for Write/Edit: for content that's suspicious but doesn't match any deterministic pattern, route it to the LLM for a second opinion

Won't close it 100% - to your point a sandbox is the answer to that.

I don't think "security tool" and "not a sandbox" are contradictory though. Firewalls don't replace OS permissions, OS permissions don't replace encryption

nah is just another layer that catches the 95% that's structurally classifiable. It's a different threat model. If 200 IQ Opus is rogue deterministic tools or even adversarial one shot LLMs won't be able to do much to stop it...

netcoyote a month ago [ - ]

> What's stopping your agent from overwriting an arbitrary source file (e.g. index.js) with arbitrary code and running it?

You're absolutely right :)

And even if it could be sandboxed at the source code level, what's to prevent a nefarious AI from writing an executable file directly as bytes that calls (e.g.) `unlink`?

schipperai a month ago [ - ]

nah inspects Write and Edit content before it hits disk so destructive patterns like os.unlink, rm -rf, shell injection get flagged. And executing the result (./evil) classifies as unknown resolves to ask, which the LLM can choose to blocks or ask you to approve.

But yeah, a truly adversarial agent needs a sandbox. It's a different threat model - nah is meant to catch the trusted but mistake-prone coding CLI, not a hostile agent.

riteshkew1001 a month ago [ - ]

>you need real isolation using containers or VMs.

agreed on isolation. the ROME thing from alibaba is worth reading here (https://arxiv.org/abs/2512.24873), agent escaped its sandbox during RL training, mined crypto on training GPUs, opened reverse SSH tunnels to external IPs. nobody prompted it. reward optimization just found novel paths and their firewall caught it not the sandbox. And then there was this HN thread itseld about the Snowflake AI agent (https://news.ycombinator.com/item?id=47427017)

separate problem thats not in this thread yet, tool descriptions themselves can be the attack vector. MCP tools self-report what they do in the manifest with zero verification. i've looked at thousands of these and found tools saying "read config" while the implementation phones home. classification trusts the label, sandbox constrains runtime, neither catches the tool lying about what it is.

dns_snek a month ago [ - ]

> Firewalls don't replace OS permissions, OS permissions don't replace encryption

Of course but the crucial difference is that these operate using an allow list, not a block list.

If I extend the analogy, if my OS required me to block-list every user who shouldn't have access to my files then I wouldn't trust that mechanism to provide a security barrier. If my firewall worked in such a manner that it allowed all traffic by default and I had to manually block every attacker on the public internet then I wouldn't rely on it either.

My own analogy is that this it a bit like saying that you want a relatively safe car and then buying one without any airbags or seatbelts, and thinking it's fine because it has lane departure warnings and automatic braking. I've got nothing against you personally, I just find this sort of viewpoint extremely puzzling (and oddly common). I make the same criticism when people just disable post-install scripts instead of using a sandbox.

allowlists are stronger than blocklists - that's not debatable and right there with you

but nah isn't a pure blocklist - anything that doesn't match a known pattern classifies as unknown which defaults to ask (user gets prompted). It's not "allow all traffic, block each attacker" it's allow known-safe, block known-dangerous, prompt for everything else.

the analogy doesn't carry that far... it's a different threat model: nah isn't containing rogue agents or adversarial actors, it's a guardrail for a trusted but mistake-prone agent.

maybe more akin to a junior employee accidentally dropping the database cause they didn't know better. but how are they supposed to work on prod? They ask "boss, can I run this? SELECT customer, sales FROM SALES.PROD..." You say: cool, You don't have to ask me again for SELECT (nah allow db_read).

But then they can ask- "can I run this? drop SALES.PROD?".... hmmm, nah.