The difference that a compiler is (generally) deterministic. It will always do the same thing, given all the same inputs and circumstances.

An LLM is not, it's probabilistic text. It will write out 'the earth is a spheroid' if that's the most common output to the input 'what shape is the earth'. But it does not understand what it is writing. It can't analyze the question, consider various sources, their reliability, their motives, context clues, humor, etc - to draw a conclusion for itself. It can't make a mistake and then learn from that mistake when corrected.

probabilistically, why does that matter? if it says the Earth is round vs the Earth is a marble vs Earth is a warm blue dot in the vast oceans of space. Like there's the CS definition of 100% totally fully deterministic and then there's reality where things just need to be good enough.

What if 0.5% of the time it says that the Earth is flat? Being used millions of times per day, it will tell thousands of people that the earth is actually flat, and may convince some of them of this false fact.

That's a pretty good one but I think a better question to challenge me is what if 1% of the time, Claude code does rm -rf ~, which has been going around. Some people are just gonna jump. Some will make it, some won't. I have backups.

There is no reason to believe an LLM answers a question with the most common answer on the internet.

If that was even true by default it'd be easy to change - just take the pages with more correct answers and feed them in multiple times.

Whatever shows up most commonly in the training data is is what an LLM will output. It's more complicated than that of course, but that's the basic idea.

And I think you missed the point. If you knew which were 'correct' and which were 'incorrect' then you could avoid the problem altogether. But that would mean someone would have to curate the entire internet, looking for anything that's 'incorrect' (or intended as humor) and making sure it doesn't end up in the training data Or LLM-generated content, to avoid cascading failures.

That's an unbelievable amount of work. It's essentially impossible, no matter how much money you throw at it. There's so much content being made every day you couldn't even keep up with what's being added let alone what's already there.

> Whatever shows up most commonly in the training data is is what an LLM will output. It's more complicated than that of course, but that's the basic idea.

The most common thing in the training data is the letter 'e'. If you're going to explain how an LLM works it needs to explain why it's able to form sentences at all.

In particular answering questions is a behavior which only appears after posttraining, and the posttraining objective has absolutely nothing to do with what's "most common" in the pretraining data.

> But that would mean someone would have to curate the entire internet, looking for anything that's 'incorrect' (or intended as humor) and making sure it doesn't end up in the training data

Show the LLM the source URL during pretraining so it can cluster them together.

https://arxiv.org/abs/2501.01956

The cheap version of this technique is to find trustworthy text (Wikipedia, answers you paid people to write, high upvoted Reddit comments) and train on it more than once. The rest falls out through emergent magic (reliable sources have different writing styles than unreliable ones and RL points it to the part of latent space with the reliable sources, or something.)

Besides that, if it encounters 95%/5% right/wrong answers to some question during training, that will have a different effect than 100%/0%. It does know when something is debated.