Unless the LLM is a base model or just a finetuned base model, it definitely doesn't predict words just based on how likely they are in similar sentences it was trained on. Reinforcement learning is a thing and all models nowadays are extensively trained with it.
If anything, they predict words based on a heuristic ensemble of what word is most likely to come next in similar sentences and what word is most likely to give a final higher reward.
> If anything, they predict words based on a heuristic ensemble of what word is most likely to come next in similar sentences and what word is most likely to give a final higher reward.
So... "finding the most likely next word based on what they've seen on the internet"?
Reinforcement learning is not done with random data found on the internet; it's done with curated high-quality labeled datasets. Although there have been approaches that try to apply reinforcement learning to pre-training[1] (to learn in an unsupervised way a predict-the-next-sentence objective), as far as I know it doesn't scale.
You know that when A. Karpathy released NanoLLM (or however it was called), he said it was mainly coded by hand as the LLMs were not helpful because "the training dataset was way off". So yeah, your argumentation actually "reinforces" my point.
No, your opinion is wrong because the reason some models don't seem to have some "strong opinion" on anything is not related to predicting words based on how similar they are to other sentences in the training data. It's most likely related to how the model was trained with reinforcement learning, and most specifically, to recent efforts by OpenAI to reduce hallucination rates by penalizing guessing under uncertainty[1].
Well, you do understand the "penalising" or as the ML scientific community likes to call it - "adjusting the weights downwards" - is part of setting up the evaluation functions, for gasp - calculating the next most likely tokens, or to be more precise, tokens with the highest possible probability? You are effectively proving my point, perhaps in a bit hand-wavy fashion, that nevertheless still can be translated into the technical language.
You do understand that the mechanism through which an auto-regressive transformer works (predicting one token at a time) is completely unrelated to how a model with that architecture behaves or how it's trained, right? You can have both:
- An LLM that works through completely different mechanisms, like predicting masked words, predicting the previous word, or predicting several words at a time.
- A normal traditional program, like a calculator, encoded as an autoregressive transformer that calculates its output one word at a time (compiled neural networks) [1][2]
So saying "it predicts the next word" is a nothing-burger. That a program calculates its output one token at a time tells you nothing about its behavior.
> So saying "it predicts the next word" is a nothing-burger. That a program calculates its output one token at a time tells you nothing about its behavior.
Well it does - it tells me it is utterly un-reliable, because it does not understand anything. It just merely goes on, shitting out a nice pile of tokens that placed one after another kind of look like coherent sentences but make no sense, like "you should absolutely go on foot to the car wash". A completely logical culmination of Bill Gates' idiotic "Content is King" proclamation of 20 years ago.
No, you can't know that the output of a program is unreliable just from the fact that it outputs one words at a time. I already told you that you can perfectly compile a normal program, like a calculator, into the weights of an autoregressive transformer (this comes from works like RASP, ALTA, tracr, etc). And with this I don't mean it in the sense of "approximating the output of a calculator with 99.999% accuracy", I mean it in the sense of "it deterministically gives exactly the same output as a calculator 100% of the time for all possible inputs".
Unless the LLM is a base model or just a finetuned base model, it definitely doesn't predict words just based on how likely they are in similar sentences it was trained on. Reinforcement learning is a thing and all models nowadays are extensively trained with it.
If anything, they predict words based on a heuristic ensemble of what word is most likely to come next in similar sentences and what word is most likely to give a final higher reward.
> If anything, they predict words based on a heuristic ensemble of what word is most likely to come next in similar sentences and what word is most likely to give a final higher reward.
So... "finding the most likely next word based on what they've seen on the internet"?
Reinforcement learning is not done with random data found on the internet; it's done with curated high-quality labeled datasets. Although there have been approaches that try to apply reinforcement learning to pre-training[1] (to learn in an unsupervised way a predict-the-next-sentence objective), as far as I know it doesn't scale.
[1] https://arxiv.org/pdf/2509.19249
You know that when A. Karpathy released NanoLLM (or however it was called), he said it was mainly coded by hand as the LLMs were not helpful because "the training dataset was way off". So yeah, your argumentation actually "reinforces" my point.
No, your opinion is wrong because the reason some models don't seem to have some "strong opinion" on anything is not related to predicting words based on how similar they are to other sentences in the training data. It's most likely related to how the model was trained with reinforcement learning, and most specifically, to recent efforts by OpenAI to reduce hallucination rates by penalizing guessing under uncertainty[1].
[1] https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4a...
Well, you do understand the "penalising" or as the ML scientific community likes to call it - "adjusting the weights downwards" - is part of setting up the evaluation functions, for gasp - calculating the next most likely tokens, or to be more precise, tokens with the highest possible probability? You are effectively proving my point, perhaps in a bit hand-wavy fashion, that nevertheless still can be translated into the technical language.
You do understand that the mechanism through which an auto-regressive transformer works (predicting one token at a time) is completely unrelated to how a model with that architecture behaves or how it's trained, right? You can have both:
- An LLM that works through completely different mechanisms, like predicting masked words, predicting the previous word, or predicting several words at a time.
- A normal traditional program, like a calculator, encoded as an autoregressive transformer that calculates its output one word at a time (compiled neural networks) [1][2]
So saying "it predicts the next word" is a nothing-burger. That a program calculates its output one token at a time tells you nothing about its behavior.
[1] https://arxiv.org/pdf/2106.06981
[2] https://wengsyx.github.io/NC/static/paper_iclr.pdf
> So saying "it predicts the next word" is a nothing-burger. That a program calculates its output one token at a time tells you nothing about its behavior.
Well it does - it tells me it is utterly un-reliable, because it does not understand anything. It just merely goes on, shitting out a nice pile of tokens that placed one after another kind of look like coherent sentences but make no sense, like "you should absolutely go on foot to the car wash". A completely logical culmination of Bill Gates' idiotic "Content is King" proclamation of 20 years ago.
No, you can't know that the output of a program is unreliable just from the fact that it outputs one words at a time. I already told you that you can perfectly compile a normal program, like a calculator, into the weights of an autoregressive transformer (this comes from works like RASP, ALTA, tracr, etc). And with this I don't mean it in the sense of "approximating the output of a calculator with 99.999% accuracy", I mean it in the sense of "it deterministically gives exactly the same output as a calculator 100% of the time for all possible inputs".
> No, you can't know that the output of a program is unreliable just from the fact that it outputs one words at a time
Yes I can, and it shows everytime the "smart" LLMs suggest us to take a walk to the carwash or suggests 1.9 < 1.11 etc...