I've given claude explicit rules and instructions about what it can and cannot do, and yet occasionally it just YOLOs, ignoring my instructions ("I'm going to modify the database directly ignoring several explicit rules against doing so!"). So yeah, no chance I run agents in a production environment.
Bit of a tangent but with things like databases the llm needs a connection to make queries. Is there a reason why no one gives the llm a connection authenticated by the user? Then the llm can’t do anything the user can’t already do. You could also do something like only make read only connections available to the llm. That’s not something enforced by a prompt, it’s enforced by the rdbms.
Yes that's what I've done (but still not giving it prod access, in case I screw up grants). It uses it's own role / connection string w/ psql.
My point was just that stated rules and restrictions that the model is supposed to abide by can't be trusted. You need to assume it will occasionally do batshit stuff and make sure you are restricting it's access accordingly.
Like say you asked it to fix your RLS permissions for a specific table. That needs to go into a migration and you need to vet it. :)
I guarantee that some people are trying to "vibe sysadmining" or "vibe devopsing" and there's going to be some nasty surprises. Granted it's usually well behaved, but it's not at all that rare where it just starts making bad assumptions and taking shortcuts if it can.