> That's a huge problem! A showstopper for many kinds of programs!

We have automated validation and automated proofs.

Proof is necessary. Do you validate the theorem prover, or trust that it works? Do you prove the compiler is correctly compiling the program (when it matters, you should, given they do sometimes re-write things incorrectly) or trust the compiler?

> We have properly abstracted away the physics of computation. A modern computer operates in a way where, if you use it the way you've been instructed to, the physics underlying the computations cannot affect the computation in any undocumented way.

You trust the hardware the code is running on? You shouldn't.

Rowhammer comes to mind, but it's hardly the only case. US banned some Chinese chips for unspecified potential that this was going on.

For some people it's OK to run a few simple tests on the chip's output to make sure it doesn't have something like the Pentium FDIV bug, for others they remove the silicon wafer from the packaging and scan it with an electron microscope, verify not just each transistor is in the right place but also that the wires aren't close enough to have currents quantum tunnelling or act as an antenna that leaks out some part of a private key.

Some people will go all the way down to the quantum mechanics. Exploits are possible at any level, domains where the potential losses exceed the cost of investigation do exist, e.g. big countries and national security.

Proof is necessary. The abstraction of hardware is good enough for most of us, and given the excessive trust already given to NPM and other package management tools, LLM output that passes automated tests is already sufficient for most.

People like me who don't trust package management tools, or who filed bugs with Ubuntu for not using https enough and think that Ubuntu's responses and keeping the bug open for years smelled like "we have a court order requiring this but can't admit it" (https://bugs.launchpad.net/ubuntu-website-content/+bug/15349...)… well, I can't speak for the paranoid, but I'm also the curious type who learned how to program just because the book was there next to the C64 game tapes.

> We have automated validation and automated proofs.

Example?

> Proof is necessary. Do you validate the theorem prover, or trust that it works? Do you prove the compiler is correctly compiling the program (when it matters, you should, given they do sometimes re-write things incorrectly) or trust the compiler?

I trust that the people who wrote the compiler and use it will fix mistakes. I trust the same people to discover compiler backdoors.

As for the rest of what you wrote: you're missing the point entirely. Rowhammer, the fdiv bug, they're all mistakes. And sure, malevolence also exists. But when mistakes or malevolence are found, they're fixed, or worked around, or at least documented as mistakes. With an LLM you don't even know how it's supposed to behave.

> Example?

Unit tests. Lean. Typed languages. Even more broadly, compilers.

> I trust the same people to discover compiler backdoors.

https://micahkepe.com/blog/thompson-trojan-horse/

> you're missing the point entirely. Rowhammer, the fdiv bug, they're all mistakes. And sure, malevolence also exists.

Rowhammer was a thing because the physics was ignored. Calling it a mistake is missing the point, it demonstrates the falseness of the previous claim:

  We have properly abstracted away the physics of computation. A modern computer operates in a way where, if you use it the way you've been instructed to, the physics underlying the computations cannot affect the computation in any undocumented way.
Rowhammer *is* the physics underlying the computations affecting the computation in a way that was undocumented prior to it getting discovered and, well, documented. Issues like this exist before they're documented, and by definition nobody knows how many unknown things like this have yet to be found.

> But when mistakes or malevolence are found, they're fixed, or worked around, or at least documented as mistakes.

If you vibe code (as in: never look at the code), then find an error with the resulting product, you can still just ask the LLM to fix that error.

I only had a limited time to experiment with this before Christmas (last few days of a free trial, thought I'd give it a go to see what the limits were), and what I found it doing wrong was piling up technical debt, not that it was a mysterious ball of mud beyond its own ability to rectify.

> With an LLM you don't even know how it's supposed to behave.

LLM generated source code: if you've forgotten how to read the source code it made for you to solve your problem and can't learn how to read that source code and can't run the tests of that source code, at which point it's as interpretable as psychology.

The LLMs themselves: yes, this is the "interpretability" problem, people are working on that.

> Unit tests.

Not proof.

> Lean.

Fantastic. But what proportion of developers are ready to formalize their requirements in Lean?

> Typed languages. Even more broadly, compilers.

For sufficiently strong type systems, sure! But then we're back in the above point.

> https://micahkepe.com/blog/thompson-trojan-horse/

I am of course aware. Any malevolent backdoor in your compiler could also exist in your LLM. Or the compiler that compiled the LLM. So you can never do better.

> Rowhammer is the physics underlying the computations affecting the computation in a way that was undocumented prior to it getting discovered and, well, documented. Issues like this exist before they're documented, and by definition nobody knows how many unknown things like this have yet to be found.

Yep. But it's a bug. It's a mistake. The unreliability of LLMs is not.

> If you vibe code (as in: never look at the code), then find an error with the resulting product, you can still just ask the LLM to fix that error.

Of course. But you need skills to verify that it did.

> LLM generated source code: if you've forgotten how to read the source code it made for you to solve your problem and can't learn how to read that source code and can't run the tests of that source code, at which point it's as interpretable as psychology.

Reading source code is such a minute piece of the task of understanding code that I can barely understand what you mean.