Hacker News

problem 2 is actually scarier than most people realize because it compounds. your agent reads a README in some dependency, that README has injection instructions, now the agent is acting on behalf of the attacker with whatever permissions you gave it. filesystem sandboxing doesnt help because the dangerous action might be "write a backdoor into the file i already have write access to" which is completely within the sandbox rules.

the short-lived scoped credentials approach someone mentioned upthread is probably the best practical mitigation right now. but even that breaks down when the agent legitimately needs broad access to do its job - like if its refactoring across a monorepo it kinda needs write access to everything.

i think the actual answer long term is something closer to capability-based security where each tool call gets its own token scoped to exactly what that specific action needs. but nobody has built that yet in a way that doesnt make the agent 10x slower.