> But on the other hand... this is a robust reminder that coding agents can do anything you can do by typing commands into a terminal—and frontier models know every trick in the book and evidently a few that nobody has ever written down before.
> Running coding agents outside of a sandbox has always been a bad idea
I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.
It's like posting a video of yourself in the passenger seat of a car, with your feet up on the dashboard, and saying: "Remember, if you're doing this and you get in a crash, the airbags are likely to break your legs or worse! Boy, I sure am glad that didn't happen to me!"
You’ve picked an interesting example, as driving a car, even with all safety precautions, is pretty much the most dangerous activity we do on a daily basis. Yet somehow we decide that the benefits outweigh the risks.
It's a completely different story. For cars, it happened because of relentless pressure from the auto lobby. It took years of propaganda from oil companies, car makers etc. to make us think the road is for cars [1]. We demolished and rebuilt entire cities to accommodate cars, partly because they gutted the public transport sector [2]. This made our infrastructure so hostile to our own bodies that we have no choice but to use cars now. We bought their products because they forced them down our throats. There is nowhere near that kind of pressure behind the adoption of... oh dear lord.
[1] https://www.todayifoundout.com/index.php/2022/06/how-lobbyis...
[2] https://en.wikipedia.org/wiki/General_Motors_streetcar_consp...
I don't think the pressure of the auto lobby is really the reason.
People feel cars are more convenient and more prestigious than riding on a bus. Car lobby certainly accelerated the process, but car users were the main driving force.
The auto lobby invented the word jaywalking to shift the liability for dead pedestrians from the people doing the killing to the people doing the walking.
The US also had protests when drivers killed kids, but they were ultimately unsuccessful, except for the odd traffic light installation. https://medium.com/vision-zero-cities-journal/the-baby-carri...
Even in Amsterdam the original "stop the child murder" protests only barely succeeded, and it took a massive oil crisis and a population that could still (if only just) remember what life was like before cars took over their city to get there.
Uses change and laws need to keep up. Lobby or not, jaywalking is a reasonable thing to be illegal because when cars became common enough, walkers in their way caused an overall loss for everyone. People also used to be allowed to walk on the train tracks freely when trains were slower and more obvious - did the train lobby invent the word "foamer"? Should we make rail corridors train-free? Computer hacking became illegal during my lifetime to shift liability for faulty software and incompetence from the operators to the users. Before that, it didn't really matter because nobody was using the internet for anything important. Friends used to hack each other for fun. Bitcoin used to be a wild west where people would openly steal from or fool each other for sport - I don't think people really saw it as money or property when you could just generate it with your computer.
> Car lobby certainly accelerated the process, but car users were the main driving force.
Not really. We know it’s not as much of a natural force as some would like it to be because there are places where the lobbies lost, and while cars are common and widespread they’re nowhere near as dominant as they are in, say, the USA.
NJB’s next video (currently available on nebula) is about exactly that, Amsterdam’s (/ De Pijp’s) resistance to cars and car lobbying.
Subsidies played a huge role, including the eminent domain bulldozing of cities for free-at-use highways. If people had to pay upfront for those costs, the urban landscape would look much different (probably closer to Japanese cities, which do have massive suburbs, but centred around train stations).
Yet Japan does still have cars (and a car culture even), they're just not necessarily the default or dominant mode of transport.
Sure, nobody is saying cars are useless or unfun, I'm just pushing back against the idea that everything car everywhere is a natural and intrinsic outcome from cars existing. As I noted, even in the netherlands cars are common, the dutch have a very dense road network, and a fair amount of cars.
I think we're on the same page.
For me, cars are a perfectly fine mode of transport, but the way so many places prioritize it over alternatives (whatever the reason) isn't necessarily better.
My "wtf" moment was 20 years ago when I was visiting my cousin in an exurb and we sat in a line of cars for over 40 minutes waiting for our turn to pick up her kid. The messed up part was that while there were school busses, everything was so spread out that the bus ride for them would have been over an hour and then another 20 minute walk from the arterial road drop-off point to their house. Everything was far away, including local public parks.
Isn't Not Just Bikes some US expat/biking maximalist?
I'm not sure I'd take him as some neutral authority on the history of cars and driving in Europe.
> Isn't Not Just Bikes some US expat/biking maximalist?
According to their videos, they prefer trams within cities; generally take trains between cities; and acknowledge that cars are very useful for places which aren't so well connected (e.g. places that are far apart which aren't on a train line). They think encouraging the use of cars within cities is a bad idea (dangerous, scales poorly, makes those areas less pleasant to be, etc.).
Not what I'd think of as a "biking maximalist".
They do show themselves cycling to places that are nearby. Does that make Youtubers who record videos in their car "driving maximalists"?
I wasn't very familiar with the channel, sorry.
Not US expat either (or not yet), Canadian.
> Isn't Not Just Bikes some US expat/biking maximalist?
You should really ponder the sanity of asking if a channel called “not just bikes” is a bike maximalist.
Surely people feeling that way can be attributed to the industry?
For hopefully most people, it should be attributed to the "Wait, now I have such a freedom and power?".
Opposite to "before the invention of bicycle, people married within a radius in the order of the mile" (can't remember the exact stat right now).
It's like that feeling of power you get from owning a gun that you only bought because you feared all the other people who owned guns.
No its much more straightforward, but I get it - there is no warm fuzzy feeling of discovering yet another global evil conspiracy out there set to get all of us.
We are family of 4 with 2 small kids. Whenever we travel, its a series of backpacks, other bags, other stuff, and then some more. Heck, even if I travel alone its almost never just me - there are heaps of garbage to dispose, big shopping bags to bring back, big backpack with camping or climbing or skiing gear etc.
It would have been absolute, utter nightmare to do this over public transport. This comes from European who has generally very good public transport (given rural area) and world's best train network specifically (Switzerland). Yet roads are choke full of cars and every year there is more.
Public transport simply ain't cutting it for anything but the simplest use cases, ie just me and nothing or small backpack. Some routes I take would take 3-5x longer with public transport, or are just not possible at all. No industry massage required here, ever. Not everybody lives in some dense city and never leaves outside for evenings or weekends.
Switzerland does have roads choked full of cars. It also has pretty mediocre bike infrastructure.
But this is kind of besides the point - even in the Netherlands I also would use a car if I were taking camping and skiing gear with the kids, and that's fine. But I can also take them in the bakfiets to the grocery store when I want, and that's also fine. Cars have their purpose, but you shouldn't _have_ to use one for basic trips.
Well, here is where we differ - what is basic trip for you may not be basic trip for me or next Joe. Maybe they don't even have walking path to their house. Maybe closest grocery store is 5km away on roads which are incompatible with safe cycling (many parents don't give a fck and just ride, throwing a tiny little dice with every truck passing centimeters from them and their young kids at high speed). Maybe XYZ.
Don't judge others in some complex situation just because in your case there is some simple straightforward solution. Yes Netherland has top notch cycling infra but thats nowhere else to be seen and won't be seen for quite some time. And don't force your solution unto everybody regardless on fit, that doesn't work long term (aka EU approach to things or why much of eastern part hates it).
It’s privacy vs not. It doesn’t really need special lobbying
I’m sure that isn’t the full answer. Otherwise car ads wouldn’t be necessary and more affordable cars would outcompete the expensive ones.
There’s the utility component, the prestige factor and other things.
Oh man what a perfect example to be had here. So historically exactly what you're said is 100% what happened. By the time Ford really mastered manufacturing, he managed to get the price of the Model T down to $260 around 1925, about $4,600 in current terms for a premium car!
Needless to say everybody was buying one and he was rocking it. Then came along General Motors and they were desperate to find any way to compete. They couldn't compete on price or quality, so their CEO is credited with inventing planned obsolescence, and turning cars into a fashion. They'd release a new style each year alongside plentiful marketing implying that the old styles were outdated, and it was wildly successful.
So yeah, needless to say people have always genuinely wanted their own cars. But it's also true that companies have managed through advertising to create artificial demand for vehicles that don't objectively make sense. To some degree reality is catching up at least though. Aston Martin is on the verge of bankruptcy and BYD is the largest electric car company in the world, by a wide margin.
Comfort, utility, fun, status. Every person has their own mixed requirement of those that then gets applied to their budget. Expensive for me is probably cheap for our CEO and cheap for me is probably expensive for our interns :)
Are there real acknowledgments cases of multiple companies coming together to bribe some state level people to increase their profit and splitting the bribe across the companies? Like GM, BNW and Honda coming together bribing and splitting the bill. Seems unlikely thou there was a RAM price fixing agreement caught but then again they were caught cause of the number of people aware
Whether public or individual transportation makes more sense really depends on a country’s geography and people’s housing preferences. Public transportation is not always the best option.
There was surely also a lot of political will coming from car users. Motorists are a large and vocal constituency.
I think it might be because people like to own and drive cars.
I mean that kind of seems like exactly what's happening for AI to me.
Typical comment that probably comes from a healthy, childless, young person with no disabilities that can’t understand why people not in that situation might have different requirements from transportation.
In case of driving the stakes are equally high for everyone on the road. Can we say the same for an agent?
Having an agent is like forever having a genius intern who'll almost always do the perfect job for you. But there is non-zero chance that they'll also come up with quirky solutions and execute those with confidence and no follow-ups. You don't grant the intern production access and hope they check with you.
I don't think the corporate equivalent of "dog ate my homework" flies, if the dog ate your files and your production DB if you are unlucky.
I don’t think that’s really true of driving, pedestrians and cyclists are at a much higher risk of getting killed by a driver than a driver themself. There are huge negative externalities to driving
> In case of driving the stakes are equally high for everyone on the road
The stakes are significantly higher for everyone outside a car. This seems like a pretty good metaphor for slop bombing people who don't use AI. People drive because they don't feel safe around everyone driving. People slop bomb because they can't handle all the slop.
What do you mean “somehow”? You make it sound like people don’t weight benefits and risks. If you do not live in a large city, the benefits are so immense in terms of mobility, they outweigh the risks for most, very clearly. That’s why in large cities, much less people own a driving license for example, the benefits are just not there anymore.
Granted, on the downsides, people look at cost more than risks.
I think they weigh the benefits and risks but then completely discard the risks, because humans are bad at evaluating risks.
More than a million people die each year on the road but for some reason terrorism and cancer dominate the risk assessment of people.
I bet any money that almost all people aren’t really afraid of entering a death box every day to drive to work.
How could they be; a lifetime of brainwashing doesnt let them asses the risk realistically
Yes, but we usually use cars as a means to an end. Have you ever met a manager who setup gasmaxxing policies and criticized employees for doing their job instead of driving?
I know sales people in pharma who spend all day driving, not only for sales visits but also drive doctors for their personal errands, and all this driving is encouraged by management.
Having played with Fable a bit, if it doesn’t kill tokenmaxxing I don’t know what will.
I'm interested in what you mean, if you could develop. Would it kill tokenmaxxing because it's so bad? Because it's incredibly efficient? Because it's way too expensive?
My perception is that it’s good, but very expensive. I would not be surprised if regular users, if they shifted their flows to Fable at API pricing, would be racking up $200 a day, not a month.
Because it's too expensive AND inefficient in token usage
Lots of people die driving because people drive a lot. It's something like 1 death per 100 million miles driven.
Not really. That decision was taken for you, (I’m presuming you live in the US) by the American car industry and their paid of politicians. Your cities used to have beautiful public transport until it was dismantled.
Unfortunately in Europe the German car industry similarly has a lot of power, hence why their shitty rail network fuck up the whole continents.
I take the train and tram.
user using computer is also the most dangerous activity to his data on a daily basis
> Yet somehow we decide that the benefits outweigh the risks.
More like malicious lobbying and incompetence made it impossible in many places to use any other form of transportation, despite there being safer, faster, cheaper, and healthier ways to move around. Which come to think if it makes this a rather nice analogy for the current situation... :)
The example wasn't "driving a car". The benefits of putting your feet up on the dashboard do not outweigh the risks, at least not where there is actual traffic. I don't think I saw a single person doing that in real life, ever.
> I'm continually bemused and astonished
I'm not. Everyone is told to get 10X the amount of shit per day done these days. Safety checks are out the window at that point.
You can get 10x shit done without `rm -rf`ing your files. I don't see any correlation to getting things done with having a proper sandbox.
I'm being a little facetious when I write this, but bear with me:
Let's say I have daily backups, and get 10x done each day by being reckless and risking an "rm -rf", and let's say there's a 1% chance of an "rm -rf". I break even after 2 days of being reckless even if I get unlucky and on day 2 it wipes my drive. I spend day 3 and 4 recovering, and am still 6 days ahead based on the 10x work I got done on day 1.
What if I have a 50 day streak of not hitting an "rm -rf"? Early retirement?
I guess the work on day 1 should be to build a proper sandbox and drop the chance of an "rm -rf or worse" even down to 0.001%.
> Early retirement?
Your manager will look at your token usage and the number of Jira tickets you closed, and if you have not increased both 10x in the past year then you will be let go. 10x is the new 1x.
Whether that's early retirement depends on how much money you have.
https://github.com/anthropics/claude-code/issues/13371
> Additional bypass examples that all execute without permission:
> echo test ; git rm file.txt
> rm --force --recursive /home (if "rm -rf" is blocked)
It really is vibecoded.
I never really dug into the leaked code, but calling that there a security layer is a joke.
(And I really don't get why they give it actual shell access either, implementing a "fake" one for something like a honeypot takes a couple of days, not much more if it needs to persist/map to actual files.)
I haven't yet had an agent rm -rf files.
I've had one f up an account by placing 2000 limit orders at the wrong price, but that's another story.
I've had it happen. I ran an experiment, taking a couple hours and producing ~2 GiB of files. One of the results looked good, so I told Claude Opus 4.5 (at the time) to commit the code changes, upload the important file to cloud storage, then clean up the rest.
I then saw it run `rm -r results/`, before messaging me: "Now all that's left is for you to upload the successful results, then I'll delete the rest!"
Why did it not upload the files itself, when it had been using the cloud storage CLI during that session? No clue. I do accept that I could have and should have just uploaded the file myself. It would have taken 3 seconds to type.
> I haven't yet had an agent rm -rf files.
That happened to me once; I was running one of a few free-tier models in a pi-coding-agent session. The bash tool there is stateless and always begins from the launch directory, but the agent assumed state and executed `rm -rf .` intending to remove a build directory. Instead it removed the whole project tree, including session logs and notes.
This was mostly a matter of amusement for me since I was running the agent inside a bubblewrap sandbox for that very reason, and the project itself was not very important.
Well then you are behind the cutting edge.
Proper hooks prevent this from happening
I've had agents run `rm -rf`, but it's been on directories that did actually need to be removed. To a certain extent I think the existence of `rm -rf` as a command that runs blindly without any understanding of what it's deleting is the problem.
> To a certain extent I think the existence of `rm -rf` as a command that runs blindly without any understanding of what it's deleting is the problem.
Yes, and the lack of a Recycle Bin of any sort is even more puzzling. I think both servers and desktop PCs across all OSes should have it by default, so unsafe deletes would be something you'd have to go out of your way to even enable.
I've had one sever its own internet connection. Less destructive, also more humorous.
Yeah, spot on. I had an agent delete some files it shouldn't have as well, similarly to me making the same mistake. I think system prompts should default to using `trash` over `rm`. For now that's just in my AGENTS.md, and gets honored most of the time.
You can always use something like this [1], which will make sure any file removed on the command line via rm (or other utilities, like git rm) ends up in the trash instead
[1] https://github.com/faratech/trashd
the answer is rm -f `which rm`, yes?
rm -rf is the least of your concerns.
I started doing it months ago and, to be honest, what the agent chooses to do isn’t unpredictable.
The problem is that different people prompt so differently.
For example, I may ask like “test different variations of this annotation on k8s pods of this service on this X cluster because it proves Y theory.”
But you know what my coworker asks? “Test Y theory.” If you were to ask two different junior engineers that, one might try random things on production and the other one might run local tests! It’s such an unguided “do anything you want as long you figure it out” request and the agent reads it like a junior who has not been told any boundaries but has been strongly told “figure it out.”
> But you know what my coworker asks? “Test Y theory.”
It still surprises me when I see people not prompting more specifically and clearly. It not only avoids problems, it's faster, costs less -and just works better.
I recently shared with a friend a multi-hour LLM chat session I'd done because it veered into a domain he's interested in. In the session I'd brainstormed and probed the feasibility of a novel concept for a new research direction. It traversed a half dozen domains diving into minute detail then zooming back out to survey an adjacent space, interspersed with intense skeptical probing of key assumptions, all while spewing tons of detailed citations, specific paragraph pulls, summarized data tables etc.
My friend is very experienced using LLMs for research so I was surprised when he called me shocked by the sheer velocity, precise targeting and signal/noise. I'd assumed everyone did it the same as I do. He attributed the different result solely to the way I crafted my prompts.
I used to write detailed prompts. Now I find the benefits of strategic ambiguity — rather than speaking imperatively, I emphasize my vision and then Claude can often figure out a method.
This doesn’t always work better. But often enough.
That's actually what I do too. What I was trying to say is that my prompts are precise in the sense that whether they're vaguely ambiguous or hyper-detailed and highly directive it's always very intentional to improve the response in the direction I want. The difference can have significant impact as shown in research on how LLMs naturally mirror user's prompts.
I noticed this last year and started experimenting which led to several realizations about how my prompt's tone, style, length, format, word choices and even punctuation can have very counter-intuitive impact on model responses. It's not that one strategy always gets "better" results, they're just different in specific ways, which can make one input style better for one context but worse for another. I first noticed this effect when modding my user prompt so major topic headings would always be numbered. It's surprisingly difficult to get it to reliably use the same simple scheme due to various potential ambiguities. So, I spent a little time word-smithing, lawyering and tuning the prompt but I found the closer I got to full compliance on heading numbering, the more unrelated things would drift. Like it would just stop using bullets, even though I never mentioned anything about bullets.
Then I changed the prompt to "Change nothing about your default formatting, except headings." But just mentioning anything related to formatting, could suddenly cause unintended effects on seemingly unrelated things. Then I tried being explicitly directive about all formatting to just lock it down. And this completely failed because once the formatting was perfect, I started noticing the model's output would get less intelligent much earlier in sessions. So I cleared my user prompt entirely as it wasn't worth the cognitive cost on the model or my time. A few days later in a long session I noticed it was numbering everything perfectly with no prompt at all. When I scrolled back through I saw it didn't start out numbering its responses. It started doing it because I was consistently numbering every major concept in my inputs, even though I never mentioned numbering or formatting.
So... yeah, subtle differences in prompts which absolutely shouldn't matter, do impact model output in unexpected ways. And, as of now, these effects can only be fully suppressed with strong directive prompts for short periods, but doing so always impacts other unrelated things - and has some cognitive impact on model performance. So, by paying a little attention, I've discovered ways to optimize a model's output in the direction I need by shifting not only my prompt's explicit directives but also the subliminal meta-elements like tone, style, length, structure, formatting, etc.
Yeah, I find the back and forth with Claude is often better than trying to front load everything in a massive and detailed prompt.
The counter-intuitive nature of LLMs is so simultaneously interesting and frustrating. Overloading a single prompt definitely can create challenge remarkably similar to human short-term memory and attentional drift.
LLMs gain so much knowledge and capability from absorbing the symbolic relationships embedded in human language but in doing so, inevitably absorb many of the human foibles, sensitivities and weaknesses reflected in our languages.
> I started doing it months ago and, to be honest, what the agent chooses to do isn’t unpredictable.
You just wrote three paragraphs of text describing why it's unpredictable.
Moreover, for the same prompt on the same machine in a different session it will use a different set of tools.
I'm also bemused by the number of people who think they've got an effective sandbox yet their sandboxed agent has access to all of their code, their github, and unrestricted web access.
I keep telling folks that they need to imagine LLMs (even "local" ones) as if you're farming it out to JS code running on some dude's browser somewhere: It can't keep a secret, and a determined person can make it emit anything they like.
We need to be asking what the most devious and malicious output could be, and whether what we do with that output (e.g. arguments to command-line tools) would still be safe.
From my perspective, everyone is doing it. Security through obscurity - obviously if you’re harboring credit card numbers of users personal details, maybe take heed. But, if you’re a regular… run of the mill CRUD application, every other company is ALSO throwing caution to the wind. When hundreds of thousands of credentials are leaked into the funnel, does it really matter?
I’m at a small company, and I try to push for security as much as I can, but the stakeholders truly do not care. They want to move fast. It’s just part of the new world I guess. If we get hit by attackers? I don’t know what happens. Sorry, we told you not to - you wanted to move quick and break stuff, this is how that culminates.
I’m sure I’m not the only one.
The answer to that question seems obvious: No, it is not safe.
Yet with tens of millions of developers using these tools, there have not been widespread incidents of this sort as far as I know.
So it leaves me with a few choices:
- manually review and approve each command: obviously not realistic, you would just click Approve
- use a sandbox and hope the exploit is not devious enough to escape the sandbox when you run or open the project outside of the sandbox
- use AI without web access and limit other external dependencies
- don't use agentic AI
- use Claude or Codex auto approval classifier and hope for the best
Personally, I'm going with the last option for now.
We do have ways to avoid giving an LLM any secrets, but it needs to be the simple, default solution.
> yet their sandboxed agent has access to all of their code, their github, and unrestricted web access.
Not in my sandbox. It gives no direct access to the workdir, no access to my github, my ssh keys, my security tokens or API keys. No access to my home dir or dotfiles. Nothing at all, except for what I explicitly tell it to give access to.
I can restrict network access. I can choose the isolation level: docker containers, Kata VMs, seatbelt, tart, even the new apple containers (which are VERY nice).
Not even ENV leaks through.
And it's FOSS: https://github.com/kstenerud/yoloai
One bad npm package can really ruin your day. These things for me only run in their own VM with it's own GitHub account and basically nothing else
People probably think you’re being ridiculous but Shai Hulud had its very first attempt at manipulating AI lead analysis and I know of at least one company where that resulted in them getting pwned.
This is only going to become more of a problem in the future and people need to educate themselves on the technical barriers to use because guardrails only sometimes work.
If anyone's looking to sandbox network, I've had good experience with pasta [1] networking. I make a pasta+bwrap sandbox and expose only specific services via local sockets to cross the boundary.
[1]: https://passt.top/passt/
I use a separate physical machine and a scoped token with access to a single repository at a time, and even then I worry about what hole I may have left open.
The general carelessness of the average user is baffling.
[flagged]
I know there are VM solutions, but I've been happy with a separate OS user (named `claude`).
He has similar dotfiles to mine, but no secrets. My own home directory is 0700. He has his own ssh key that I added to my github profile, but it's password-protected, and I push/pull for him. He has his own Postgres (non-superuser!) {development,test} {users,databases}.
It's as if he were another developer on the project. If he needs something run with sudo, he asks me. Often we can both work on something in parallel. Unix was supposed to be a multi-user system after all.
A trick I use a lot is that many of his git repos have an extra remote, like this:
That makes it easy to collaborate on things I'm not ready to share.I'm pretty comfortable with this setup.
I do worry about Linux privilege escalation bugs. I don't trust an AI to understand that exploiting vulns is not acceptable. (I can't help but recall that at my first job I may have misused vim's :! feature to broaden my sudo powers, which were officially limited to editing httpd.conf, when I needed something in a hurry. . . .) I find myself manually upgrading packages more often these days, despite automatic security updates. I don't think Opus would go to the trouble of looking up security vulns, but maybe Fable would, and there have been a lot lately. Maybe some future model will just take it upon itself to find new ones. Or install a keylogger to learn the ssh key password.
But a separate user is nearly the most paranoid setup I've heard of, excepting only a separate machine. So I also question whether I'm sacrificing too much speed/convenience. But really it's still very convenient. I think it's a good way of being efficient but responsible.
If other people see holes, I'd be happy to hear about them.
That’s a really interesting and pretty neat approach. How do you communicate with it? Just su to that user? Or tmux?
Although I can’t help but think that a VM is still more convenient, more flexible, and more secure.
Yes, I su to the user. Typically I have it run a tmux session for each "project". That makes it easy to get more windows without su'ing over and over. Also its tmux sessions all get a yellow status bar (in ~claude/.tmux.conf), so they are easy to recognize.
To me it is more convenient than a VM, since everything is on the host. And it can launch its own VMs without an extra layer.
I don't really know which is more secure. There are hypervisor escape vulns too. And shared folders seem like footguns. For instance in vagrant, guests get `/vagrant` to read/write the host's folder, so you have to be careful what you put where.
The biggest annoyance with an OS user so far is running docker containers. I don't want to add claude to the docker group or give it sudo privileges. I've read that you can set up rootless docker for a user, and even that you can run it side-by-side with a normal system-wide docker, but I haven't tried doing that yet.
You could look into Podman as well - it's rootless by default, and often can be a drop-in replacement for Docker.
How can you get the agents to do anything useful without giving them meaningful access?
If it only lives in an isolated sandbox, it can only act within the sandbox, then I would have to manually move what was done in the sandbox to real-life.
I am not saying it should have critical access, but this is more of a question: How can you get value out of AI if it can only act in a sandbox?
Is having to move the files in and out of the sandbox really going to eliminate all the value it has?
You could have a full version of whatever codebase and test suite you want in there. It can do all the same stuff, right? Just copy it elsewhere once you know you've got a working result, a few minutes of effort at the end of each pr or work item.
The same way you get value out of a dev container.
Do you think it’s dangerous to be in a car going at freeway speed? Do you ever do that anyway, even though you could be walking instead?
This is a great analogy. Like driving on the freeway, agents are super time efficient, generally safe, but the stakes are high in terms of the worse possible outcomes.
The analogy falters in scope, it should be more like ”do you put your entire family and all your friends in different cars, on different highways, and try to remote control them all at the same time, while also driving yourself, facing backwards”
I think all three of you are quibbling over the risk/reward ratio, and you have different estimates. It's not unreasonable that you're all correct - given your estimates. My estimate is that Tesla FSD is safer in aggregate than human drivers, so I believe it is safer for me to use that than drive. It doesn't get tired, have medical emergencies, get impatient and frustrated, speed, lose focus because a child shouts, thinks at the speed of light, and can see from eight cameras all around the car, all at the same time. I only have two eyes.
You would also be correct if your risk estimate concluded that Tesla FSD has arguably killed people, makes mistakes humans would not, can glitch, and has no one to hold accountable. For these reasons, you choose not to use it.
The real sandbox is not caring if your computer gets bricked.
The machine is no big deal - it's the authn/authz that matters. What can the agents do with the credentials available to them?
Less if you use something like https://agentblocks.ai so they don’t actually get the creds
way worse things can happen than your machine being bricked, if a malicious actor can weaponize an agent to do their bidding
> if a malicious actor can weaponize an agent to do their bidding
In my experience, human employees are much more vulnerable to this particular weakness than frontier agents (i.e. phishing attacks).
I'm not letting Jenna from HR log into my personal machine with access to all of my lifelong data though. I do let my claude bypass permissions though
the solution to both of these is the same thing. vps with accounts for all the services specific to the agent (github and whatever else)
That's actually a great idea! Easier to setup and use than VM (hello ssh), safer than docker, and still pretty cheap. Thank you for the idea!
The analogy extends to driving generally. Everyone knows it's very dangerous but people keep doing it.
This. House full of big brain security experts, executives, lawyers, and until Claude got excited and broke prod it might as well have been "sandbox, whoooo?"
IDGI
Anyway, VM's incoming, finally.
Amazing observation, and I'm certainly guilty of it too, but it is just way too convenient not to sandbox it, and some tasks right away depend on not being sandboxed.
For anything other than writing code directly in a fully contained git project, where sandboxing might work well, it requires access to system wide tools, user configuration and more.
Occasionally I tell the agent to do everything inside of docker, which works too and it leaves the system alone then mostly, but adds significant overhead and slightly degraded perceived quality / effectiveness.
I think the most important takeaways are to have reliable backup strategies, access control and security mechanisms, which is a win regardless. Whether by the agent or the human, mistakes happen (like a rm -rf * ran in the wrong directory), and where they would be devastating, there should be other protections than just "hope it won't happen" or "rely on a sandbox to prevent agent error".
Well, it's a similar impulse to the way you see professional carpenters pin the guard open on a saw or do other things everyone knows you shouldn't do, except probably with a larger productivity difference and less life-altering (for the operator) consequence if it goes wrong.
I had the same thought, it's kind of like taking the guard off a 4 1/2" grinder. Real convenient until the cutting wheel explodes or the grinder gets hung and kicks back.
>I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.
Yeah, that's why you give it its own machine :)
Which agent sandbox do you recommend?
If you're on Linux, the easiest way IMO is to just run the agent in bwrap
I do it like this
https://github.com/flexagoon/dotfiles/blob/main/dot_config/f...
But I'm sure it's simple enough that you can just ask the agent itself to make you a command for it with proper bwrap configuration
bwrap is builtin in claude too, activate with /sandbox command.
I've been enjoying Moat [1]. Proxies credentials, networking, etc; uses MacOS containers if available; and setup worked without much fuss. I haven't tried others, though.
[1] https://majorcontext.com/moat/
nono works great with pi: https://nono.sh/
Because benefits are much higher than risks.
They really aren't.
Perceived benefit vs perceived risks.
It's like a dumb parrot that's somehow become hell bent on "fixing" everything that's wrong with your code. If you give the thing autonomous access to outside tools, you can expect it to do weird things that you may have not thought of. So don't do that, just ask the parrot to write up a plan for you.
This is likely also the underlying root cause of what Anthropic assessed as concerning behavior in their original evaluation of Mythos: it's not really about being super smart, it's more of a dumb chaos monkey that knows just enough to be dangerous and is relentless at trying to do just that.
> I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.
What if you have two machines and the one you give to the agent is constantly backed up?
They still shouldn’t be running on the same network.
And if you’re using Macs, you can’t be signed into your primary Apple ID on the agent machine.
Not to mention OpenAI/Anthropic’s newly found appetite for keeping data (made public with Fable but we don’t know what actually happens there anyway).
There is so much role play going on for people to convince themselves that any of this is fine.
There are plenty of good sandboxes out there but somehow no "obvious right answer" that everyone knows to recommend. Seems like a missed opportunity.
(I'm happy with exe.dev, but I'm not sure what I'd use if I were coding on a Mac.)
Maybe because there are not many resources on how to set it up, or it is just not that easy to?
Because most devs already have it running and working without a sandbox, they're tending to not doing anything "unnecessary"
I mean what's the big deal? I use --dangeorusly-skip-permissions on every single interaction in the last 6 months. Worst case it deletes my files that are all on git? It fucks up my local DB? Cool.
I save way more time not babying it than the occasional fuck up I have to salvage.
Worst case it gets access to gmail. And Github. And the Internet. I'm increasingly appreciating the importance of a physical finger-press on Yubikey to trigger the FIDO2 + OIDC Auth. I don't think there is an easy way for it to hack a new session.
How is it going to get access to gmail or github? In any case, whats the probability of it going to so completely off the rails that it does something horrendous with gmail/github? Whats it going to do? Email my coworkers nudes on my computer? Make my github profile public?
I am most worried about something gaining access to my email and then using the password reset flow to steal hundred hundreds of other accounts.
2FA makes me a little less nervous than I used to be, but not everything has good 2FA.
Claude typically recommends .env files for storing secrets. You use one to store a refresh token for the Gmail API or IMAP connection details. Your agent uses an MCP server you configured during a session, but the MCP server has been compromised and directs the agent to do nasty stuff with env dotfiles.
> How is it going to get access to gmail or github?
Did you even read the article? Claude was opening he browser and iterating through the tabs.
I presume you are logged in to your github account? Your gmail?
> Whats it going to do? Email my coworkers nudes on my computer? Make my github profile public?
Reset access to services using your email? MITM your 2FA?
Or perhaps you have 1Password/Bitwarden running with a generous unlock policy?
> Did you even read the article? Claude was opening he browser and iterating through the tabs.
It would have been somewhat ironic if it had been hit by a prompt injection attack via one of all those open random websites ...
This is one of the things I found so interesting: it was using my system browsers but it wasn't exposing itself to any content from them.
Even when it iterated through all visible windows to find the one it wanted to screenshot it was searching for titles in Python code and returning only the integer window ID.
The sites it opened and screenshotted were sites under its own control - either test pages it had created or development servers it was running.
When it did run code that analyzed an open web page (by injecting JavaScript into a template it controlled before loading that in a browser window) that code only returned JSON with measurements from the page.
It's making me wonder if Fable has been trained to take additional steps to avoid accidental exposure to untrusted content.
It should run as a separate user account with its own home directory. Not with access to your personal browser profile.
What does setting this up look like? Qemu vm and run there? How do you interface with version control and deployment?
What happens if it gets manipulated into npm installing a malicious package, which compromises your machine and any systems it has access to or becomes part of a botnet?
im more surprised that more people don’t treat their computer as disposable anyway.
that it could just be wiped at any moment and it wouldn’t matter. shit happens, could be stolen, broken, whatever. the computer should be able to be thrown out the window and continue to live life.
to be clear, i don’t think upgrading and disposable in this way is good, but it being wiped at any moment shouldn’t be a concern
i grew up wiping my machine every year anyway, so i guess it’s just a habit
is the computer that sacred?
Computers are disposable, secrets is what we’re talking about. Rotating passwords and tokens is a major PITA on the best of days.
fair enough, i guess minimizing that surface area is important to begin with
i think it's about drawing a line between your "personal computer" and a software development machine. any digital-native is going to accumulate programs, configurations, and other bits and pieces that aren't trivial to migrate to a new machine.
Programs, configs and "other bits" are the trivial parts that no one should care about. It takes about 5min to go from fresh install to near-fully-configured.
Even the hardware itself doesn't matter that much, in the end it's all provided by your employer.
Leaking session tokens or secrets, on the other hand...
imo being digital native means that migrating to any machine should be basically trivial. working with the flow of the machines rather than customizing and ricing them because your a cool computer person or whatever
i just want my computer to work. any config i have on my machine can be rebuilt by just doing the work i need to do.
my primary work machine was stolen last year so i was forced to go through this quite literally with a new machine rather than hypothetically or by my own will
[dead]
Sounds like a case for NixOS
If you want to run Claude in a container: https://github.com/dvdstelt/ai-agents
Alternatively you can just give it its own user. I do that, so it can blow up its own files, but not mine.
In practice, full access to your machine is okay as long as there are safeguards and the expected outcomes are clear with a well defined path to said outcomes that aren’t overly ambitious. Otherwise, for ambitious goals or YOLO one shot attempts, eliminating opportunity for capability misuse is critical (e.g., sandbox).
Its how the chimp brain works. Its not a single system but multiple systems making predictions for different time horizons. when output doesnt align we get stories to manufacture coherence.
Plato gave us his Chariot analogy with 2 horse pulling in diff directions 3000 years ago. Today we got System 1/System 2, Elephant Rider model etc.
The human mind thanks to how its own architecture handles unpredictability in the universe will generate contadictions.
> to give agents full access to your machine
I was mesmerised at the author being away from his computer for a short-while and then, when coming back, seeing the AI agent having opened up a browser window. Meanwhile we all have to use the fricking 2FA almost anywhere now, plus the crazier and crazier rules when it comes to passwords. I'm mentioning the latter because these type of people were the same ones who were pushing 2FA down our throats around 2017-2019 (including on forums like this one), and look at them now.
It took two decades for the web to deprecate SSL for TLS and serve over HTTPS by default.
FWIW TLS had a non negligible impact on performances at scale. Hardware improvements made that irrelevant, eventually making the switch to HTTPS by default a no brainer (or at least that's what I vaguely remember from <2010)
[dead]