Would you feel comfortable pushing generated code to production unaudited?

For my I have a company subscription for Copilot and I just use the line based autocomplete. It’s mildly better than the built in autocomplete. I never have it do more than though and probably wouldn’t buy a license for myself.

Would you feel comfortable pushing human code to production unaudited?

depends on the human.

but i would never push llm generated code. never.

-

edit to add some substance:

if it’s someone who

* does a lot of manual local testing

* adds good unit / integration tests

* writes clear and well documented PRs

* knows the code style, and when to break it

* tests themselves in a staging environment, independent of any QA team or reviews

* monitors the changes after they’ve gone out

* has repeatedly found things in their own PRs and asked to hold off release to fix them

* is reviewing other people’s PRs and spotting things before they go out

yea, sure, i’ll release the changes. they’re doing the auditing work for me.

they clearly care about the software. and i’ve seen enough to trust them.

and if they got it wrong, well, shit, they did everything good enough. i’m sure they’ll be on the ball when it comes to rolling it back and/or fixing it.

an llm does not do those things. an llm *does not care about your software* and never will.

i’ll take people who give a shit any day of the week.

I'd say it depends more on "the production" than the human. There are legal means to hold all people accountable for their actions ("Gross neglience" and all that). So you can basically always trust that people will fix what they messed up given the possibility. So if you can afford for the production to be broken (e.g. the downtime will just annoy some people) you might as well allow your team to deploy straight to prod without audits. It's not that rare actually.

Only on Fridays before a three day weekend.

Nope. But AI's sales pitch is that it's an oracle to lean on. Which is part of the problem.

As a start, let me know when an AI can fail test cases, re-iterate on its code to correct the test case, and re-submit. But I suppose that starts to approach AGI territory.