Hacker News

Despite multiple comments blaming the AI agent, I think it's the backups that are the problem here, right? With backups, almost any destructive action can be rolled back, whether it's from a dumb robot, a mistaken junior, or a sleep-deprived senior. Without, you're sort of running the clock waiting for disaster.

forgotaccount3 a month ago [ - ]

Yes, backups are great but a 'dumb robot' or a 'mistaken junior' shouldn't have access to prod.

And a sleep-deprived senior? Even then. They shouldn't have access to destructive effects on prod.

Maybe the senior can get broader access in a time-limited scope if senior management temporarily escalates the developers access to address a pressing production issue, but at that point the person addressing the issue shouldn't be fighting to stay awake nor lulled into a false sense of security as during day to day operations.

Otherwise it's only the release pipeline that should have permissions to take destructive actions on production and those actions should be released as part of a peer reviewed set of changes through the pipeline.

JoBrad a month ago [ - ]

If a sleep-deprived senior shouldn’t have access to prod, I think we have big problems, frankly.

fragmede a month ago [ - ]

Which, if you're Google-sized, you have follow-the-sun rotations, in order to avoid that problem. But what about the rest of the class?

charcircuit a month ago [ - ]

But smart robots like Claude should and will have access to production. There has to be something figured out on how to make sure operation remains smooth. The argument of don't do that will not be a viable position to hold long term. Keeping a human in the loop is not necessary.

b112 a month ago [ - ]

It is absolutely necessary. Point in fact, most DEVs don't have access to PROD either. Specialists do.

Clause, maybe, is a junior DEV.

Not a release engineer.

abustamam a month ago [ - ]

Should and will are pretty large assumptions given the the post we're commenting on!

> will not be a viable position to hold long term

Why not? We've literally done it without robots, smart or dumb, for years.

charcircuit a month ago [ - ]

>We've literally done it without robots, smart or dumb, for years.

And we've written extremely buggy and insecure C code for decades too. That doesn't mean that we should keep doing that. AI can much faster troubleshoot and resolve production issues than humans. Putting humans in the loop will cause for longer downtime and more revenue loss.

abustamam a month ago [ - ]

> AI can much faster troubleshoot and resolve production issues than humans

Can, yes, with proper guardrails. The problem is that it seems like every team is learning this the hard way. It'd be great to have a magical robot that could magically solve all our problems without the risk of it wrecking everything. But most teams aren't there yet and to suggest that it's THE way to go without the nuances of "btw it could delete your prod db" is irresponsible at best.

charcircuit a month ago [ - ]

It didn't delete the prod db on its own a human introduced such error, and if there were backups it could fix such a mistake.

abustamam a month ago [ - ]

There were backups. The AI deleted them.

charcircuit a month ago [ - ]

When people talk about backups they typically mean located somewhere else. If one terraform command can take out the db and the backups then those backups aren't really separate. It's like using RAID as a backup. Sure it may help, but there are cases where you can lose everything.

QuercusMax a month ago [ - ]

Nobody, not even a "smart robot" should have unfettered read-write production access without guardrails. Read-only? Sure - that's a totally different story.

Read-write production access without even the equivalent of "sudo" is just insane and asking for trouble.

esseph a month ago [ - ]

> Keeping a human in the loop is not necessary.

You don't work in anything considered Safety Critical, do you?

hobs a month ago [ - ]

You need to care about your Recovery Time (how long does it take to get back up again?) and your Recovery Point(how long since your backup was taken?) and it gets Much Worse when you start distributing state around your various cloud systems - oh did that queue already get that message? how do we re-send that? etc

happytoexplain a month ago [ - ]

They are two orthogonal issues. One doesn't make the other irrelevant.

tomcatfish a month ago [ - ]

I agree that a second issue doesn't erase the first, but also I've got enough work experience to know that a system which can be brought down by 1 person no matter the tooling they use is a system not destined to last for long.

Joel_Mckay a month ago [ - ]

Zero workmanship was always worth nothing.

It usually takes about 10 months for folks to have a moment of clarity. Or for the true believer they often double down on the obvious mistakes. =3

clouedoc a month ago [ - ]

100% agree. Everyone should always backup their production database somewhere where's it's not trivial to delete.