Titles like these make me always point out the obvious: A working state is the absolute minimum requirement for any code to be merged, isn't it? ...imagine to merge something even though you know that's not working.

Besides, this post has nothing specific to code produced by an LLM, and placing AI in the stated reasons feels completely arbitrary, or is rather a fallacy of our times:

- I reject [AI] code when I can’t explain the approach in my own words.

- I reject [AI] code when the diff is bigger than the problem.

- I reject [AI] code when it introduces abstractions before proving they’re needed.

- I reject [AI] code when it works locally but makes the system harder to reason about.

- I reject [AI] code when I’m trusting the output more than my understanding.

I’ve had multiple people say “you don’t work on code anymore, that’s for the AI. You work a level of abstraction above that. As long as you prove it works through testing, the code doesn’t matter anymore. It’s like looking at the assembly the compiler spits out now - who cares?”

These are the people who spit out an incredible volume of code with AI, to the point reviews simply can’t keep up.

The last person who said this to me works in embedded, where we look at the assembly all the time. Scary.

But if the output matches the duck typing test, does it actually matter what's inside the black box of code?

If you're given two embedded devices and both pass the same testing, how would you tell which one was 100% AI code and which was beautifully handcrafted line by line?

Most embedded code is security / safety critical, so it gets looked at by auditors. So, then.

Also, when something invariably doesn’t work (maybe I told Claude “delay 1 sec after each swing of the axe the robot makes if the proximity sensor trips to avoid the puppy that walks across the ax’s path once every month”, and meant to type “2 sec”), I still have to go down to the level of the code sometimes. I’m sure the counter argument is “well then that just means your testing wasn’t good enough”. Sure, but I’ve never seen any project with hardware in the loop where the testing was good enough 100% of the time. Sometimes it’s hard to test once in a month type events in a regression test suite.

FWIW I hover around 80-90% code AI written these days. I still look at every line of code it makes.

Even software related projects don't have 100% test coverage.

No amount of reading code or auditing or testing gets you 100% bug free solutions. It's possible, but nobody outside of maybe NASA will foot the bill for that.

My point is that why does it matter who or what wrote the code if errors are inevitable anyway? You plan what you do when you encounter one and limit the blast radius. If you find a process that can cut out a category of bugs, you implement it when you encounter it.

Why do we allow human written code to have more errors than AI generated code? Or is it just that both create different type of errors?

Well said. Replace [AI] with "junior dev" or "consultancy contractor" and these assertions have always been thus.

Fallacy or scapegoat. If management ask for revised KPIs where PRs must be 10x and AI is the "excuse" for this (unrealistic) new demand.