> The Wason selection task is the classic example: most people fail a simple conditional reasoning problem unless it’s dressed up in familiar social context, like catching cheaters.
I've never heard about the Wason selection task, looked it up, and could tell the right answer right away. But I can also tell you why: because I have some familiarity with formal logic and can, in your words, pattern-match the gotcha that "if x then y" is distinct from "if not x then not y".
In contrast to you, this doesn't make me believe that people are bad at logic or don't really think. It tells me that people are unfamiliar with "gotcha" formalities introduced by logicians that don't match the everyday use of language. If you added a simple additional to the problem, such as "Note that in this context, 'if' only means that...", most people would almost certainly answer it correctly.
Mind you, I'm not arguing that human thinking is necessarily more profound from what what LLMs could ever do. However, judging from the output, LLMs have a tenuous grasp on reality, so I don't think that reductionist arguments along the lines of "humans are just as dumb" are fair. There's a difference that we don't really know how to overcome.
Quoting the Wikipedia article's formulation of the task for clarity:
> You are shown a set of four cards placed on a table, each of which has a number on one side and a color on the other. The visible faces of the cards show 3, 8, blue and red. Which card(s) must you turn over in order to test that if a card shows an even number on one face, then its opposite face is blue?
Confusion over the meaning of 'if' can only explain why people select the Blue card; it can't explain why people fail to select the Red card. If 'if' meant 'if and only if', then it would still be necessary to check that the Red card didn't have an even number. But according to Wason[0], "only a minority" of participants select (the study's equivalent of) the Red card.
[0] https://web.mit.edu/curhan/www/docs/Articles/biases/20_Quart...
People in everyday life are not evaluating rules. They evaluate cases, for whether a case fits a rule.
So, when being told:
"Which card(s) must you turn over in order to test that if a card shows an even number on one face, then its opposite face is blue?"
they translate it to:
"Check the cards that show an even number on one face to see whether their opposite face is blue and vice versa"
Based on this, many would naturally pick the blue card (to test the direct case), and the 8 card (to test the "vice versa" case).
They wont check the red to see if there's an odd number there that invalidates the formulation as a general rule, because they're not in the mindset of testing a general rule.
Would they do the same if they had more familiarity with rule validation in everyday life or if the had a more verbose and explicit explanation of the goal?
Yeah maybe if you phrased it as "Which card(s) must you turn over in order to ensure that all odd-numbered cards are blue?" you'd get a better response?
Exactly. We invented rule-based machines so that we could have a thing that follows rules, and adheres strictly to them, all day long.
Im not sure why people keep comparing machine-behaviour to human's. Its like Economic models that assume perfect rationality... yeah that's not reality mate.
I've confidently picked 8+blue and is now trying to understand why I personally did that. I think that maybe the text of the puzzle is not quite unambiguous. The question states "test a card" followed by "which cards", so this is what my brain immediately starts to check - every card one by one. Do I need to test "3"? No, not even. Do I need to test "8"? yes. Do I need to test "blue"? Yes, because I need to test "a card" to fit the criteria. And lastly "red" card also immediately fails verification of a "a card" fitting that criteria.
I think a corrected question should clarify in any obvious way that we are verifying not "a card" but "a rule" applicable to all cards. So a needs to be replaced with all or any, and mention of rule or pattern needs to be added.
It also doesn't explain why people don't think it necessary to check the 3 to make sure it's not blue (which it would be if "if" meant "if and only if").
I think we're actually closer to agreement than it might seem.
You're right that the Wason task is partly about a mismatch between how "if" works in formal logic and how it works in everyday language. That's a fair point. But I think it actually supports what I'm saying rather than undermining it. If people default to interpreting "if x then y" as "if and only if" based on how language normally works in conversation, that is pattern-matching from familiar context. It's a totally understandable thing to do, and I'm not calling it a cognitive defect. I'm saying it's evidence that our default mode is contextual pattern-matching, not rule application. We agree on the mechanism, we're just drawing different conclusions from it.
Your own experience is interesting too. You got the right answer because you have some background in formal logic. That's exactly what I'd expect. Someone who's practiced in a domain recognizes the pattern quickly. But that's the claim: most reasoning happens within well-practiced domains. Your success on the task doesn't counter the pattern-matching thesis, it's a clean example of it working well.
On the broader point about LLMs having a "tenuous grasp on reality," I hear that, and I don't want to flatten the differences. There probably is something meaningfully different going on with how humans stay grounded. I just think the "humans reason, LLMs pattern-match" framing undersells how much human cognition is also pattern-matching, and that being honest about that is more productive than treating it as a reductionist insult.
Agree with much of your comment.
Though note that as GP said, on the Wason selection task, people famously do much better when it's framed in a social context. That at least partially undermines your theory that its lack of familiarity with the terminology of formal logic.
Maybe the social version just creates a context where "if x then y" obviously does not include "if not x then not y". Everyone knows people over the drinking age can drink both alcoholic and non-alcoholic drinks, so you obviously don't have to check the person drinking the soft drink to make sure they aren't an adult.
I for the life of me could not solve the <18 example from wikipedia. but the number/color one is super easy
As they say, "think about how smart the average person is, then realize half the population is below that". There are far more haikus than opuses walking this planet.
We keep benchmarking models against the best humans and the best human institutions - then when someone points out that swarms, branching, or scale could close the gap, we dismiss it as "cheating". But that framing smuggles in an assumption that intelligence only counts if it works the way ours does. Nobody calls a calculator a cheat for not understanding multiplication - it just multiplies better than you, and that's what matters.
LLMs are a different shape of intelligence. Superhuman on some axes, subpar on others. The interesting question isn't "can they replicate every aspect of human cognition" - it's whether the axes they're strong on are sufficient to produce better than human outcomes in domains that matter. Calculators settled that question for arithmetic. LLMs are settling it for an increasingly wide range of cognitive work. The fact that neither can flip a burger is irrelevant.
Humans don't have a monopoly on intelligence. We just had a monopoly on generality and that moat is shrinking fast.
The "God of the gaps" theory is a theological and philosophical viewpoint where gaps in scientific knowledge are cited as evidence for the existence and direct intervention of a divine creator. It asserts that phenomena currently unexplained by science—such as the origin of life or consciousness—are caused by God.
We are doing inversion of God of gaps to "LLM of Gaps" where gaps in LLM capabilities are considered inherently negative and limiting
It is not actually the gaps in capability, and instead it arises from an understanding of how it works and an honest acknowledgement of how far it could go.
The question is not if these things are actually intelligent or not. The question is if these things will be useful without an endless supply of training data and continuous re-alignment using it..
And the questions "Are these things really intelligent" is just a proxy for that.
And we are interested in that question because that is necessary to justify the massive investment these things are getting now. It is quite easy to look at these things and conclude that it will continue to progress without any limit.
But that would be like looking at data compression at the time of its conception, and thinking that it is only a matter of time we can compress 100GB into 1KB..
We live in a time of scams that are obvious if you take a second look. If something that require much deeper scrutiny, then it is possible to generate a lot more larger bubble.
> and that moat is shrinking fast..
The point is that in reality it is not. It is just appearance. If you consider how these things work, then there is no justification of this conclusion.
I have said this elsewhere, but the problem of Hallucination itself along with the requirement of re-training, the smoking gun that these things are not intelligence in ways that would justify these massive investments.
> If you added a simple additional to the problem, such as "Note that in this context, 'if' only means that...", most people would almost certainly answer it correctly.
Agreed. More broadly, classical logic isn't the only logic out there. Many logics will differ on the meaning of implication if x then y. There's multiple ways for x to imply y, and those additional meanings do show up in natural language all the time, and we actually do have logical systems to describe them, they are just lesser known.
Mapping natural language into logic often requires a context that lies outside the words that were written or spoken. We need to represent into formulas what people actually meant, rather than just what they wrote. Indeed the same sentence can be sometimes ambiguous, and a logical formula never is.
As an aside, I wanna say that material implication (that is, the "if x then y" of classical logic) deeply sucks, or rather, an implication in natural language very rarely maps cleanly into material implication. Having an implication if x then y being vacuously true when x is false is something usually associated with people that smirk on clever wordplays, rather than something people actually mean when they say "if x then y"
Your response contains a performative contradiction: you are asserting that humans are naturally logical while simultaneously committing several logical errors to defend that claim.
This comment would be a lot more useful with an enumeration of those logical errors.
commenter’s specific claim—that adding a note about the definition of "if" would solve the problem—is a moving the goalposts fallacy and a tautology. The comment also suffers from hasty generalization (in their experience the test isn't hard) and special pleading (double standard for LLM and humans).
When someone tells you "you can have this if you pay me", they don't mean "you can also have it if you don't pay". They are implicitly but clearly indicating you gotta pay.
It's as simple as that. In common use, "if x then y" frequently implies "if not x then not y". Pretending that it's some sort of a cognitive defect to interpret it this way is silly.
In the original studies, most people made an error that can't be explained by that misunderstanding: they failed to select the card showing 'not y'.
From my armchair this feels relevant:
> Decoding analyses of neural activity further reveal significant above chance decoding accuracy for negated adjectives within 600 ms from adjective onset, suggesting that negation does not invert the representation of adjectives (i.e., “not bad” represented as “good”)[...]
From: Negation mitigates rather than inverts the neural representations of adjectives
At: https://journals.plos.org/plosbiology/article?id=10.1371/jou...