The longer term for this is "stochastic parrot". See another HN comment here comparing LLMs to theater actors or movie actors.
LLMs just spew words. It just so happens that human beings can decode them into something related, useful, and meaningful surprisingly often.
Might even be a useful case of pareidolia (a term I dislike, because a world without any pattern matching whatsoever would not necessarily be "better").
I dislike the term “stochastic parrot”, because there’s plenty of evidence that LLMs do have an understanding of at least some things that they are saying.
We can trace which neurons activate for a face recognition model and see that a certain neuron does light up when it sees a face. The correct features are active for the sentence “the word parrots is plural”.
If you stop assuming the LLMs have no internal representations of the data, then everything makes a lot more sense! The LLM is FORCED to answer questions… just like a high school student filling out the SAT is forced to answer questions.
If a high school student fills out the wrong answer on the SAT, is that a hallucination?
Hallucinations are expected behaviors if you RLHF a high schooler to always guess an answer on the SAT because that’ll get them the highest score. This applies to ML model reward functions as well.
This matches what we know about LLMs and hallucination-avoidance behavior in LLMs.
"Wrong answers on SAT" is also the leading hypothesis on why o3 was such an outlier - far more prone to hallucinations than either prior or following OpenAI models.
On SAT, giving a random answer is right 20% of the time - more if you ruled at least one obviously wrong answer out. Saying "I don't know" and not answering is right 0% of the time. So if you RLVR on SAT-type tests, where any answer is better than no answer, you encourage hallucinations. Hallucination avoidance in LLMs is a fragile capability, and OpenAI has probably fried its o3 with too much careless RLVR.
But another cause of hallucinations is limited self-awareness of modern LLMs. And I mean "self-awareness" in a very mechanical, no-nonsense fashion: "has information about itself and its own capabilities". LLMs have very little of that.
Humans have some awareness of the limits of their knowledge - not at all perfect, but at least there's something. LLMs get much, much less of that. LLMs learn the bulk of their knowledge from pre-training data, but pre-training doesn't teach them a lot about where the limits of their knowledge lie.
> But another cause of hallucinations is limited self-awareness of modern LLMs… Humans have some awareness of the limits of their knowledge
Until you said that I didn’t realize just how much humans “hallucinate“ in just the same ways that AI does. I have a friend who is fluent in Spanish, a native speaker, but got a pretty weak grammar education when he was in high school. Also, he got no education at all in Critical thinking, at least not formally. So this guy is really, really fluent in his native language, but can often have a very difficult time explaining why he uses whatever grammar he uses. I think the whole world is realizing how little our brains can correctly explain and identify the grammar we use flawlessly.
He helps me to improve my Spanish a lot, he can correct me with 100% accuracy of course, but I’ve noticed on many occasions, including this week, that when I ask a question about why he said something one way or another in Spanish, he will just make up some grammar rule that doesn’t actually exist, and is in fact not true.
He said something like “you say it this way when you really know the person and you’re saying that the other way when it’s more formal“, but I think really it was just a slangy way to mis-stress something and it didn’t have to do with familiar/formal or not. I’ve learned not to challenge him on any of these grammar rules that he makes up, because he will dig his heels in, and I’ve learned just to ignore him because he won’t have remembered this made up grammar rule in a week anyway.
This really feels like a very tight analogy with what my LLM does to me every day, except that when I challenge the LLM it will profusely apologize and declare itself incorrect even if it had been correct after all. Maybe LLMs are a little bit too humble.
I imagine this is a very natural tendency in humans, and I imagine I do it much more than I’m aware of. So how do humans use self-awareness to reduce the odds of this happening?
I think we mostly get trained in higher education to not trust the first thought that comes into our head, even if it feels self consistent and correct. We eventually learn to say “I don’t know” even if it’s about something that we are very, very good at.
Spanish in particular has more connotations per word than English. It's not even the grammar or spelling, those have rules and that's that. But choosing appropriate words, is more like every word has it right place and time and context. Some close examples would be the N-word or the R-word in English, as they are steeped in meanings far beyond the literal.
He said something like “you say > it this way when you really know the person and you’re saying that the other way when it’s more formal“, but I think really it was just a slangy way to mis-stress something and it didn’t have to do with familiar/formal or not.
There’s such a thing in Spanish and in French. Formal and informal settings is reflected in the language. French even distinguishes between three different level of vocabulary (one for very informal settings (close friends), one for business and daily interactions, and one for very formal settings. It’s all cultural.
> We can trace which neurons activate for a face recognition model and see that a certain neuron does light up when it sees a face.
Seeing which parts of a model (they aren't neurons) light up when shown a face doesn't necessarily indicate understanding.
The model is a complex web of numbers representing a massively compressed data space. It could easily be that what you see light up when shown a face only indicates what specific part of the model is housing the compresses data related to recognizing specific facial features.
I think this could be seen as a proxy for evidence that there's some degree of reasoning, if we think we can identify specialized features that always become involved in some kinds of outputs. It's not proof, but it's not nothing either. It has some parallels about research on human brains are conducted, right?
It does have parallels to the human brain, absolutely. We've been studying the human brain in similar ways for much longer though and we still don't know much about it.
We do know what areas of a human brain often light up in response to various conditions. We don't know why that is though, or how it actually works. Maybe more importantly for LLMs, we don't know how human memory works, where it is stored, how to recognize or even define consciousness, etc.
Seeing what areas of a brain or an LLM light up can be interesting, but I'd be very cautious trying to read much into it.
> Seeing which parts of a model (they aren't neurons)…
I thought models were composed of neural network layers, among other things. Are these data structures called something different?
That point may not have been relevant for me to include.
I was getting at the idea that a neuron is a very specific feature of a biological brain, regardless of what AI researchers may call their hardware they aren't made of neurons.
1. They are neurons, whether you like it or not. A binary tree may not have squirrels living in them, but it's still a tree, even though the word "tree" here is defined differently than from biology. Or are you going to say a binary tree is not a tree?
2. You are about 5 years behind in terms of the research. Look into hierarchical feature representation and how MLP neurons work. (Or even in older CNNs and RNNs etc). And I'm willingly using the word "neuron" instead of "feature" here because while I know "feature" is more correct in general, there are definitely small toy models where you can pinpoint an individual neuron to represent a feature such as a face.
What were you getting at with the MLP example? MLPs do a great job with perception abilities and I get that they use the term neuron frequently. I disagree with the use of the name there that's all, similarly I disagree that LLMs are AI but here we are.
Using the term neuron there and meaning it literally is like calling an airplane a bird. I get that the colloquial use exists, but no one thinks they are literal birds.
Do you also disagree with the use of the name “tree” in a computer science class?
Again, nobody thinks trees in computer science contains squirrels, nobody thinks airplanes are birds, and nobody thinks a neuron in a ML model contains axons and dendrites. This is a weird hill to die on.
Are you gonna complain that the word “photograph” is “light writing” but in reality nobody is writing anything so therefore the word is wrong?
I would disagree with anyone that wants to say they are the same as a natural tree, sure.
I don't believe the term photograph was repurposed when cameras were invented, that example doesn't fit.
More importantly, I argued that neuron has a very specific biological meaning and its a misuse to use the term for what is ultimately running on silicon.
Your claim was that they are neurons, period. You didn't expand on that further which reads as a pretty literal use of the term to me. We're online discussing in text, that reading of your comment could be completely wrong, that's fine. But I stand by my point that what is inside an LLM or a GPU is not a neuron.
https://en.m.wikipedia.org/wiki/Artificial_neuron
What's your point with that link? I'm well aware that people use the term neuron in AI research and acknowledged it a few comments up. I disagree with the use of term, I'm not arguing that the term isn't used.
>I dislike the term “stochastic parrot”, because there’s plenty of evidence that LLMs do have an understanding of at least some things that they are saying.
It's bold to use the term "understanding" in this context. You ask it something about a topic, it gives an answer like someone who understands the topic. You change the prompt slightly, where a human who understands the topic would still give the right response trivially, the LLM outputs an answer that is both wrong/irrelevant and unpredicably and non-humanly wrong in a way that no human who exhibited understanding with the first answer could be predicted to answer the second question in the same bizarrw manner as the LLM.
The fact that the LLM can be shown to have some sort of internal representation does not necessarily mean that we should call this "understanding" in any practical sense when discussing these matters. I think it's counterproductive in getting to the heart of the matter.
> You change the prompt slightly, where a human who understands the topic would still give the right response trivially, the LLM outputs an answer that is both wrong/irrelevant and unpredicably and non-humanly wrong in a way that no human who exhibited understanding with the first answer could be predicted to answer the second question in the same bizarrw manner as the LLM.
I think this should make you question whether the prompt change was really as trivial as you imply. Providing an example of this would elucidate.
Here's an entire paper [0] showing the impact of extremely minor structural changes on the quality of the results of the model. Things as simple as not using a colon in the prompt can lead to notably degraded (or improved) performance.
0. https://arxiv.org/pdf/2310.11324
> LLMs just spew words. It just so happens that human beings can decode them into something related, useful, and meaningful surprisingly often.
This sentence is inherently contradictory. If LLM output is meaningful more than chance, then it's literally not "just spewing words". Therefore whatever model it is using to generate that meaning must contain some semantic content, even if it's not semantic content that's as rich as humans are capable of. The "stochastic parrot" term is thus silly.
It's a sufficiently large N shannonizer. Nothing more
so are we, maybe.
Definitely true at some level:
https://en.wikipedia.org/wiki/Predictive_coding
But..but...humans do the same thing? This is input/output with the output being formed from applying the function of our life experiences (training) to the input? They just don't have the hormonal circuitry to optimise for whatever our bodies are trying to optimise for when we take decisions.
People actually understand the input and the output. An LLM understand neither, it's generating output that is statistically likely, within some bounds. As Fowler said, it's a pleasant coincidence that some of this output has value to us.
(For sure, arguments can be made that the relationship betweens the terms that the model has encoded could maybe be called "model thinking" or "model understanding" but it's not how people work.)
We do the same thing. We pick words that are statistically likely to get us what we want. And much of it is unconscious. You don't formally reason about every word you speak. You are focused on your objective, and your brain fills in the gaps.
We absolutely do not "pick words that are statistically likely to get us what we want." We use words to try to articulate (to varying levels of success) a message that we want to communicate. The words, tone, speed, pitch, etc. all convey meaning.
> And much of it is unconscious.
That does not mean we're "picking words statistically likely to get us what we want," it means "our brains do a lot of work subconsciously." Nothing more.
> You are focused on your objective, and your brain fills in the gaps.
This is a total contradiction of what you said at the start. LLMs are not focused on an objective, they are using very complex statistical algorithms to determine output strings. There is no objective to an LLM's output.
The LLM objective is whatever they are trained to do, whether it's completing text, obeying instructions, coding, etc.
In pre-training, we drop a lot of human-written text in them. This allows them to learn the rules of language and grammar and common language patterns. At this stage, the objective is to predict the next token that makes sense to human beings.
Examples: The capital of US is ... Why did the chicken ...
The next step is instruct training, where they are trained to follow instructions. At this point, they are predicting the next token that will satisfy the user's instructions. They are rewarded for following instructions.
Next step, they are trained to reason by feeding them with reasoning examples to get them going, and then rewarding them whenever their reasoning leads them to good answers. They learn to predict the next reasoning token that will lead them to the best answers.
The objective is imparted by their training. They are "rewarded" when their output satisfies the objective, so that as they are trained, they get better and better at achieving the objectives of the training.
> The LLM objective is whatever they are trained to do, whether it's completing text, obeying instructions, coding, etc.
That’s not an objective of the LLM, it’s an objective of the human creating, training and using it. The LLM is a tool to achieve that objective.
There is an objective. Solving an optimization problem. Or seen otherwise, given a matrix of predicates, it tries to compute some kind of final value that is as close to 1 as possible by applying the matrix to your input prompt. This is more or less what it does.
That’s not really an objective of the LLM otherwise we’d say Dijktra’s Algorithm has an objective. It doesn’t in the sense that a human has the objective to find the shortest path. The tools have a purpose but that’s imbued by humans, not intrinsic to them.
Interesting that you say that. What is actually 'understanding'? A semantic mapping between tokens (words) and objects and relations between those objects? How would you define it?