> review the ASM that GCC generates (we don't)

Of course we do not. Because there is no need. The process of compiling higher order language to assembly is deterministic and well-tested. There is no need to continue reviewing something that always yields the same result.

> We care that it works, and is correct for what it is supposed to do.

Exactly. Which is something we do not have with an output of an LLM. Because it can misunderstand or hallucinate.

Therefore, we always have to review it.

That is the difference between the output of compilers and the output of LLMs.

This. The comparison between compilers and LLMs is so utterly incorrect, and yet I've heard it multiple times already in the span of a few weeks. The people suggesting this are probably unaware of the fact that Turing complete languages follow mathematical properties not just vibes. You can trust the output of your compiler because it was thoroughly tested to ensure it acts as a Turing machine that converts one Turing complete language (C, C++, whatever) into another Turing complete language (ASM) and there's a theorem that guarantees you that such a conversion is always possible. LLMs are probabilistic machines and it's grossly inappropriate to put them in the same category as compilers - it would be like saying that car tires and pizzas are similar because they're both round and have edges.

> The process of compiling higher order language to assembly is deterministic and well-tested.

Here are the reported miscompilation bugs in GCC so far in 2026. The ones labeled "wrong-code".

https://gcc.gnu.org/bugzilla/buglist.cgi?chfield=%5BBug%20cr...

I count 121 of them.

I've posted this 3 times now. Code-generation by compilers written by experts is not deterministic in the way that you think it is.

In the 12+ years I've been a professional developer, I can only remember two bugs that were caused by the compiler / interpreter, everything else were logic bugs, oversights, 3rd-party libraries, misunderstanding of the requirements, internal contradictions in the requirements etc.

So that's maybe 0.1% of all the bugs I've touched.

In that sense, code generation isn't really an interesting source of bugs for the discussion at hand.

There were more ~26+ years ago. gcc and egcs had some subtle register allocator bugs that would get tripped up under heavy register pressure on i386 that were the bane of my existence as a kernel developer at the time.

It's close enough. If we had a build pipeline that kept prompts in source control, and ran it through an LLM and then a compiler to produce the build output, this would fall over constantly. You'd get radically different results every time. Build pipelines that store actual source code in source control, then run them through a compiler to produce the build output, are used all over the place and they generally work great.

If the "bug" shows up every time in the output given the same input, then it definitely is deterministic.

Just because there are bugs does not mean a compiler is non-deterministic. I looked through a bunch of the bug reports and there is nothing there that can't be fixed to make it deterministic.

You can't fix an LLM to be absolutely deterministic, but you can fix a compiler.