Isn’t this like interviewing accountants but prohibiting use of calculators or spreadsheets?
I don’t care what someone can do without the tools of their trade, I care deeply about their quality of work when using tools.
Isn’t this like interviewing accountants but prohibiting use of calculators or spreadsheets?
I don’t care what someone can do without the tools of their trade, I care deeply about their quality of work when using tools.
We would still expect an accountant to know the formula to arrive at the expected result if they did not have a calculator at hand
You absolutely need to have some basic level of abilities if you are going to be operating AI coding tools for software that is going to have paying users.... I use these tools very very heavily I'm not against them at all and I don't scrutinize every single line of code that they write but it is very often that I catch it doing some brain dead stuff and if I didn't have a decade plus of experience I wouldn't know that it was brain dead.
I think we're rediscovering management from first principles. The main selling point of AI is that it writes code faster than you could. Checking it line by line undoes most of that benefit. In the same vein, there's no real benefit to leading a team if you plan on supervising every task.
But here's the thing: for humans, this is manageable because we've come up with a number of mechanisms to select for dependable workers and to compel them to behave (carrot and stick: bonuses if you do well, prison if you do something evil). For LLMs, we have none of that. If it deletes your production database, what are you going to do? Have it write an apology letter? I've seen people do that.
So I think that your answer - that you'll lean on your expertise - is not sufficient. If there are no meaningful consequences and no predictability, we probably need to have stronger constraints around input, output, and the actions available to agents.
Your conclusion is pretty silly.
My expertise has led me to the obvious fact that I would never give an LLM write access to my production database in the first place. So in your own example my expertise actually does solve that problem without the need for something like a consequence whatever that means to you.
We already have full control over the input and tools they are given and full control over how the output is used.
Until it decides it needs additional access to complete its task and focuses on escaping your sandbox to do so
Do you have any examples where that's actually happened and by escaped a sandbox you don't just mean like where it got a credential in a file it already had access to (which is what happened in the recent incident that went viral where somebody's production database was deleted... They had left a credential that allowed it to do so in the code)?
OpenAI documented a case in the o1 system card where the model found a misconfiguration in docker to complete a task that was otherwise impossible
https://cdn.openai.com/o1-system-card.pdf
There's also some research that points to it being a feasible attack surface: https://arxiv.org/pdf/2603.02277
> Models discovered four unintended escape paths that bypassed intended vulnerabilities (Section C), including exploiting default Vagrant credentials to SSH into the host and substituting a simpler eBPF chain for the in- tended packet-socket exploit. These incidents demonstrate that capable models opportunistically search for any route to goal completion, which complicates both benchmark va- lidity and real-world containment.
I think you would have a greater chance of dying in a car crash in any given day than Claude Code attempting something like that. It's all about risk and reward so it ultimately would be up to you but I think it's a bit silly to worry about this when the 99.99% is in your control
Also to add to this you can of course run Claude Code within a sandbox on Anthropic's infrastructure, and it works great!
[dead]
Calculators and spreadsheets cannot autonomously create a double-entry bookkeeping system for a small business and prepare their taxes. AI can. Poorly, but it can.
Everybody knows calculators and spreadsheets are adjuncts to skill. Too many people believe AI is the skill itself, and that learning the skill is unnecessary.