But if the output matches the duck typing test, does it actually matter what's inside the black box of code?
If you're given two embedded devices and both pass the same testing, how would you tell which one was 100% AI code and which was beautifully handcrafted line by line?
Most embedded code is security / safety critical, so it gets looked at by auditors. So, then.
Also, when something invariably doesn’t work (maybe I told Claude “delay 1 sec after each swing of the axe the robot makes if the proximity sensor trips to avoid the puppy that walks across the ax’s path once every month”, and meant to type “2 sec”), I still have to go down to the level of the code sometimes. I’m sure the counter argument is “well then that just means your testing wasn’t good enough”. Sure, but I’ve never seen any project with hardware in the loop where the testing was good enough 100% of the time. Sometimes it’s hard to test once in a month type events in a regression test suite.
FWIW I hover around 80-90% code AI written these days. I still look at every line of code it makes.
Even software related projects don't have 100% test coverage.
No amount of reading code or auditing or testing gets you 100% bug free solutions. It's possible, but nobody outside of maybe NASA will foot the bill for that.
My point is that why does it matter who or what wrote the code if errors are inevitable anyway? You plan what you do when you encounter one and limit the blast radius. If you find a process that can cut out a category of bugs, you implement it when you encounter it.
Why do we allow human written code to have more errors than AI generated code? Or is it just that both create different type of errors?