HN is resistant because at the end of the day, these are LLMs. They cannot and do not think. They generate plausible responses. Try this in your favorite LLM: "Suppose you're on a game show trying to win a car. There are three doors, one with a car and two with goats. You pick a door. The host then gives you the option to switch doors. What is the best strategy in this situation?" The LLM will recognize this as SIMILAR to the Monty Hall problem and tell you to always switch. I just reproduced this with ChatGPT.
But this is completely wrong! In the Monty Hall problem, the host has to reveal a door with a goat behind it for you to gain the benefit of switching. I have to point this out for the LLM to get it right. It did not reason about the problem I gave it, it spat out the most likely response given the "shape" of the problem.
This is why shrugging and saying "well humans get things wrong too" is off base. The problem is that the LLM is not thinking, period. So it cannot create a mental model of your understanding of a subject, it is taking your text and generating the next message in a conversation. This means that the more niche the topic (or your particular misunderstanding), the less useful it will get.
That's because the LLM assumes you mean for the host to open the door.
As an autistic person I can assure you that people filling in information you neither gave nor implied is the norm for human conversations. In that sense, the LLMs are being human-like.
If you explicitly tell it you didn't forget to have the host open the door it gives you the right answer:
> There is a game show with three closed doors. One has a car and two have goats. I pick a door. Then, without opening my door or any other door, the host asks me if I want to switch my choice to one of the other two doors. What is my best strategy and what are the probabilities?
Gemini: This is a classic probability puzzle known as the Monty Hall problem, though your description presents a slight variation. The optimal strategy depends entirely on whether the host opens a door or not. Your Scenario: The Host Doesn't Open a Door In the situation you've described, where the host asks if you want to switch your choice to one of the other two doors without opening any doors, there is no advantage to switching. Here's a breakdown of the probabilities in your specific scenario: Your initial choice: When you first pick a door, you have a 1 in 3 chance of selecting the door with the car. This means you have a 2 in 3 chance of having picked a door with a goat. The other two doors: The combined probability that the car is behind one of the other two doors is 2/3. Switching your choice: If you decide to switch, you are essentially choosing one of the other two doors. Since there is no new information (a door with a goat has not been revealed), the 2/3 probability is evenly split between those two doors. Therefore, your chance of winning by switching to a specific one of the other doors is 1/3. In this case, your odds of winning are 1/3 whether you stay with your original choice or switch to one of the other doors.
> That's because the LLM assumes you mean for the host to open the door.
LLM's cannot "assume". There is no thinking involved. It sees that the prompt looks like the monty hall problem and it just goes full steam ahead.
>If you explicitly tell it you didn't forget to have the host open the door it gives you the right answer:
That should not be necessary. I asked it a very clear question. I did not mention Monty Hall. This is the problem with LLM's: it did not analyze the problem I gave it, it produced content that is the likely response to my prompt. My prompt was Monty Hall-shaped, so it gave me the Monty Hall answer.
You are saying "ah but then if you prepare for the LLM to get it wrong, then it gets it right!" as if that is supposed to be convincing! Consider the millions of other unique questions you can ask, each with their own nuances, that you don't know the answer to. How can you prevent the LLM from making these mistakes if you don't already know the mistakes it's going to make?
> LLM's cannot "assume". There is no thinking involved. It sees that the prompt looks like the monty hall problem and it just goes full steam ahead.
I think the poster's point was that many humans would do the same thing.
Try a completely different problem, one you invented yourself and see where you get? I'd be very interested to hear the response back here.
Humans who have heard of Monty Hall might also say you should always switch without noticing that the situation is different. That's not evidence that they can't think, just that they're fallible.
People on here always assert LLMs don't "really" think or don't "really" know without defining what all that even means, and to me it's getting pretty old. It feels like an escape hatch so we don't feel like our human special sauce is threatened, a bit like how people felt threatened by heliocentrism or evolution.
> Humans who have heard of Monty Hall might also say you should always switch without noticing that the situation is different. That's not evidence that they can't think, just that they're fallible.
At some point we start playing a semantics game over the meaning of "thinking", right? Because if a human makes this mistake because they jumped to an already-known answer without noticing a changed detail, it's because (in the usage of the person you're replying to), the human is pattern matching, instead of thinking. I don't think is surprising. In fact I think much of what passes for thinking in casual conversation is really just applying heuristics we've trained in our own brains to give us the correct answer without having to think rigorously. We remember mental shortcuts.
On the other hand, I don't think it's controversial that (some) people are capable of performing the rigorous analysis of the problem needed to give a correct answer in cases like this fake Monty Hall problem. And that's key... if you provide slightly more information and call out the changed nature of the problem to the LLM, it may give you the correct response, but it can't do the sort of reasoning that would reliably give you the correct answer the way a human can. I think that's why the GP doesn't want to call it "thinking" - they want to reserve that for a particular type of reflective process that can rigorously perform logical reasoning in a consistently valid way.
I'm not sure what your argument is. The common claim that annoys me about LLMs on here is that they're not "really" coming up with ideas but that they're cheating and just repeating something they read on the internet somewhere that was written by a human who can "really" think. To me this is obviously false if you've talked to a SOTA LLM or know a little about how they work.
On the other hand, computers are suppose to be both accurate and able to reproduce said accuracy.
The failure of an LLM to reason this out is indicative that really, it isn’t reasoning at all. It’s a subtle but welcome reminder that it’s pattern matching
Computers might be accurate but statistical models never were 100% accurate. That doesn't imply that no reasoning is happening. Humans get stuff wrong too but they certainly think and reason.
"Pattern matching" to me is another one of those vague terms like "thinking" and "knowing" that people decide LLMs do or don't do based on vibes.
Pattern matching has a definition in this field, it does mean specific things. We know machine learning has excelled at this in greater and greater capacities over the last decade
The other part of this is weighted filtering given a set of rules, which is a simple analogy to how AlphaGo did its thing.
Dismissing all this as vague is effectively doing the same thing as you are saying others do.
This technology has limits and despite what Altman says, we do know this, and we are exploring them, but it’s within its own confines. They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)
I think reasoning, as any layman would use the term, is not accurate to what these systems do.
You're derailing the conversation. The discussion was about thinking, and now you're arguing about something entirely different and didn't even mention the word “think” a single time.
If you genuinely believe that anyone knows how LLMs work, how brains work, and/or how or why the latter does “thinking” while the former does not, you're just simply wrong. AI researchers fully acknowledge ignorance in this matter.
> Pattern matching has a definition in this field, it does mean specific things.
Such as?
> They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)
Multi billion parameter models are definitely not wholly understandable and I don't think any AI researcher would claim otherwise. We can train them but we don't know how they work any more than we understand how the training data was made.
> I think reasoning, as any layman would use the term, is not accurate to what these systems do.
Based on what?
You’re welcoming to provide counters. I think these are all sufficiently common things that they stand on their own as to what I posit
Look, you're claiming something, it's up to you to back it up. Handwaving what any of these things mean isn't an argument.
I guess computer vision didnt get this memo and it is useless.
>People on here always assert LLMs don't "really" think or don't "really" know without defining what all that even means,
Sure.
To Think: able to process information in a given context and arrive at an answer or analysis. an LLM only simulates this with pattern matching. It didn't really consider the problem, it did the equivalent of googling a lot of terms and then spat something that sounded like an answer
To Know: To reproduce information based on past thinking, as well as to properly verify and reason about with the information. I know 1+1 = 2 because (I'm not a math major, feel free to inject number theory instead) I was taught that arithmatic is a form of counting, and I was taught the mechanics of counting to prove how to add. Most LLM models don't really "know" this to begin with for the reasons above. Maybe we'll see if this study mode is different.
Somehow I am skeptical if this will really change minds, though. People making swipes at the community like this often are not really engaging in a conversation with ideas they oppose.
I have to push back on this. It's the people who constantly assert that LLMs “don't think” who are not engaging in a conversation. It's a thought-terminating cliché.
Unfortunately, even those willing to engage in this conversation still don't have much to converse about, because we simply don't know what thinking actually is, how the brain works, how LLMs work, and to what extent they are similar or different. That makes it all the more vexing to me when people say this, because the only thing I can say in response is “you don't know that (and neither does anyone else)”.
>It's the people who constantly assert that LLMs “don't think” who are not engaging in a conversation.
I'm responding to the conversation. Oftentimes it's engaged on "AI is smarter than me/other people". It's in the name, but "intelligence" is a facade put on by the machine to begin with.
>because we simply don't know what thinking actually is
I described my definition. You can disagree or make your own interpretation, but to dismiss my conversation and simply say "no one knows" is a bit ironic for a person accusing me of not engaging in a conversation.
Philosophy spent centuries trying to answer that question. Mine is a simple, pragmatic approach. Just because there's no objective answer doesn't mean we can't converse about it.
You're just deferring to another vague term "pattern matching".
If I think back to something I was taught in primary school and conclude that 1+1=2 is that pattern matching? Therefore I don't really "know" or "think"?
People pretend like LLMs are like some 80s markov chain model or nearest neighbor search, which is just uninformed.
Do you want to shift the discussion to the definition of a "pattern" or are we going to continue to move the goalpost? I'm trying to respond to your inquiry and instead we're just stuck in minutia.
Yes, to make an apple pie from scratch, we need to first invent the universe. Is that productive conversation to fall into or can we just admit that your dismissing any opinion that goes against your purview?
>If I think back to something I was taught in primary school and conclude that 1+1=2 is that pattern matching?
Yes. That is an example of pattern matching. Let me know when you want to go back to talking about LLMs.
So because I'm pattern matching that means I'm not thinking right? That's the same argument you have for LLMs.
LLMs are vulnerable to your input because they are still computers, but you're setting it up to fail with how you've given it the problems. Humans would fail in similar ways. The only thing you've proven with this reply is that you think you're clever, but really, you are not thinking, period.
And if a human failed on this question, that's because they weren't paying attention and made the same pattern matching mistake. But we're not paying the LLM to pattern match, we're paying them to answer correctly. Humans can think.
“paying the LLM”??
I use the Monty Hall problem to test people in two steps. The second step is, after we discuss it and come up with a framing that they can understand, can they then explain it to a third person. The third person rarely understands, and the process of the explanation reveals how shallow the understanding of the second person is. The shallowest understanding of any similar process that I've usually experienced is an LLM.
I am not sure how good your test really is. Or at least how high your bar is.
Paul Erdös was told about this problem with multiple explanations and just rejected the answer. He could not believe it until they ran a simulation.
In my experience, as Harvard outlined long ago, the two main issues with decision making are frame blindness (don't consider enough other ways of thinking about the issue) and non-rigorous frame choice (jumping to conclusions).
But an even more fundamental cause, as a teacher, is that I often find seemingly different frames to both simply be misunderstood, not understood and rejected. I learned by trying many ways of presenting what I thought the best frame was. So I learned that "explanations" may be received primarily as noise, with "What is actually being said" being replaced with, incorrectly, by "What I think you probably mean". Whenever someone replies "okay" to a yes or no comment/statement, I find they have always misunderstood the statement, and learned how often people will attempt to move forwards without understanding where they are.
And if multiple explanations are just restatings of the same frame (as is common in casual arguments), it's impossible to compare frames, because only one is being presented.. It's the old "if you think aren't making any mistakes, that's another mistake".
Often, a faulty frame clears up both what is wrong with another frame, as well as leading to a best frame. I usually find the most fundamental frame is the most useful.
For example, I found many Reddit forums discussing a problem with selecting the choice of audio output (speaker) on Fire TV Sticks. If you go through the initial setup, sometimes it will give you a choice (first level of flow chart), but often not the next level choice, which you need. And setup will not continue. Then it turned out that old remotes and new remotes had the volume buttons in a different location, and there were two sets of what looked like volume buttons. When you pressed the actual volume buttons, everything worked normally. When you pressed the up/down arrows where the old volume buttons had been, you had to restart setup many times.
The correct framing of the problem was "Volume buttons are now on the left, not the right". It was not a software setup issue. Or wondering why you're key doesn't work, but you're at the wrong car. Or it's not a problem with your starter motor, you're out of gas. Etc.
I don't know who Paul Erdös is, so this isn't useful information without considering why they rejected the answer and what counterarguments were provided. It is an unintuitive problem space to consider when approaching it as a simple probability problem, and not one where revealing new context changes the odds.
Erdös published more papers than any other mathematician in history—and collaborated with more than 500 coauthors, giving rise to the concept of the "Erdős number," a (playful) measure of collaborative proximity among mathematicians