The solution is proxy everything. The agent doesn't have an api key, or yoyr actual credit card. It has proxies of everything but the actual agent lives in a locked box.

Control all input out of it with proper security controls on it.

While not perfect it aleast gives you a fighting chance when your AI decides to send a random your SSN and a credit card to block it.

> with proper security controls on it

That's the hard part: how?

With the right prompt, the confined AI can behave as maliciously (and cleverly) as a human adversary--obfuscating/concealing sensitive data it manipulates and so on--so how would you implement security controls there?

It's definitely possible, but it's also definitely not trivial. "I want to de-risk traffic to/from a system that is potentially an adversary" is ... most of infosec--the entire field--I think. In other words, it's a huge problem whose solutions require lots of judgement calls, expertise, and layered solutions, not something simple like "just slap a firewall on it and look for regex strings matching credit card numbers and you're all set".

Yeah i'm deffinetly not suggesting it's easy.

The problem simply put is as difficult as:

Given a human running your system how do you prevent them damaging it. AI is effectively thr same problem.

Outsourcing has a lot of interesting solutions around this. They already focus heavily on "not entirely trusted agent" with secure systems. They aren't perfect but it's a good place to learn.

> The solution is proxy everything.

Who knew it'd be so simple.

Unfortunately I don't think this works either, or at least isn't so straightforward.

Claude code asks me over and over "can I run this shell command?" and like everyone else, after the 5th time I tell it to run everything and stop asking.

Maybe using a credit card can be gated since you probably don't make frequent purchases, but frequently-used API keys are a lost cause. Humans are lazy.

Per task granular level control.

You trust the configuration level not the execution level.

API keys are honestly an easy fix. Claude code already has build in proxy ability. I run containers where claude code has a dummy key and all requestes are proxied out and swapped off system for them.