> Latent reasoning doesn't really appear until around 100B params.
Please provide a citation for wild claims like this. Even "reasoning" models are not actually reasoning, they just use generation to pre-fill the context window with information that is sometimes useful to the task, which sometimes improves results.
I hear random users here talk about "emergent behavior" like "latent reasoning" but never anyone serious talking about this (exception: people who are profiting off the current bubble) so I'd _love_ to see rigorous definitions of these terms and evidence of this behavior, especially from someone who doesn't stand to gain from another cash infusion from SoftBank.
I suspect these things don't exist. At the very most, they're a mirage, and exist in the way a rainbow does. Go on and try to find that pot of gold, eh?
> Please provide a citation for wild claims like this. Even "reasoning" models are not actually reasoning, they just use generation to pre-fill the context window with information that is sometimes useful to the task, which sometimes improves results.
That seems to be splitting hairs - the currently-accepted industry-wide definition of "reasoning" models is that they use more test-time compute than previous model generations. Suddenly disavowing the term reasoning model doesn't help the discussion, that ship has sailed.
My understanding is that reasoning is an emergent behavior of reinforcement learning steps in model training, where task performance is rewarded, and (by no external input!) the model output starts to include phrases ala "Wait, let me think". Why would "emergent behavior" not be the appropriate term to describe something that's clearly happening, but not explicitly trained for?
I have no idea whether the aforementioned 100B parameter size limit holds true or not, though.
Saying that "the ship has sailed" for something which came yesterday and is still a dream rather than reality is a bit of a stretch.
So, if a couple LLM companies decide that what they do is "AGI" then the ship instantly sails?
Only matters if they can convince others that what they do is AGI.
As always ignore the man behind the curtain.
Just like esoteric appropriation of 'quantum entanglement', right? It's vibe semantics now.
I'm almost positive reasoning is not an emergent behavior considering the reasoning models have specific architecture. As a source: https://arxiv.org/html/2504.09762v1
> currently-accepted industry-wide definition of "reasoning"
You can't both (1) declare "reasoning" to be something wildly different than what humans mean by reasoning and (2) insist people are wrong when they use the normal definition say models don't reason. You gotta pick a lane.
I don't think its too problematic, its hard to say something is "reasoning" without saying what that something is, for another example of terms that adjust their meaning to context for example, the word "cache" in "processor cache", we know what that is because its in the context of a processor, then there's "cache me outside", which comes from some tv episode.
It's a tough line to tread.
Arguably, a lot of unending discourse about the "abilities" of these models stems from using ill-defined terms like reasoning and intelligence to describe these systems.
On the one hand, I see the point that we really struggle to define intelligence, consciousness etc for humans, so it's hard to categorically claim that these models aren't thinking, reasoning or have some sort of intelligence.
On the other, it's also transparent that a lot of the words are chosen somewhat deliberately to anthropomorphize the capabilities of these systems for pure marketing purposes. So the claimant needs to demonstrate something beyond rebutting with "Well the term is ill-defined, so my claims are valid."
And I'd even argue the marketers have won overall: by refocusing the conversation on intelligence and reasoning, the more important conversation about the factually verifiable capabilities of the system gets lost in a cycle of circular debate over semantics.
sure, but maybe the terms intelligence and reasoning aren't that bad when describing what human behavior we want these systems to replace or simulate. I'd also argue that while we struggle to define what these terms actually mean, we struggle less about remembering what these terms represent when using them.
I'd even argue that its appropriate to use these terms because machine intelligence kinda sorta looks and acts like human intelligence, and machine reasoning models kinda sorta look like how a human brain reasons about things, or infer consequences of assertions, "it follows that", etc.
Like computer viruses, we call them viruses because they kinda sorta behave like a simplistic idea of how biological viruses work.
> currently-accepted industry-wide definition of "reasoning"
The currently-accepted industry-wide definition of reasoning will probably only apply to whatever industry we're describing, ie., are we talking human built machines, or the biological brain activity we kinda sorta model these machines on?
marketting can do what they want I got no control over either the behavior of marketters or their effect on their human targets.
Or you could accept that sometimes fields contain terms-of-art that are non-intuitive to outsiders. Go ask an astromer what their working definition of a metal is.
No. This is the equivalent of an astronomer telling a blacksmith they're using the term "metal" incorrectly. Your jargon does not override everyone else's language.
> Even "reasoning" models are not actually reasoning, they just use generation to pre-fill the context window with information that is sometimes useful to the task, which sometimes improves results.
I agree that seems weak. What would “actual reasoning” look like for you, out of curiosity?
Not parent poster, but I'd approach it as:
1. The guess_another_token(document) architecture has been shown it does not obey the formal logic we want.
2. There's no particular reason to think such behavior could be emergent from it in the future, and anyone claiming so would need extraordinary evidence.
3. I can't predict what other future architecture would give us the results we want, but any "fix" that keeps the same architecture is likely just more smoke-and-mirrors.
Seems to fall apart at 1
>1. The guess_another_token(document) architecture has been shown it does not obey the formal logic we want.
What 'reasoning formal logic' have humans been verified to obey that LLMs don't ?
... Consider this exchange:
Alice: "Bob, I know you're very proud about your neural network calculator app, but it keeps occasionally screwing up with false algebra results. There's no reason to think this new architecture will reliably do all the math we need."
Bob: "How dare you! What algebra have humans been verified to always succeed-at which my program doesn't?! Huh!? HUH!?"
___________
Bob's challenge, like yours, is not relevant. The (im)perfection of individual humans doesn't change the fact that the machine we built to do things for us is giving bad results.
It's not irrelevant, because this is an argument about whether the machine can be said to be reasoning or not.
If Alice had concluded that this occasional mistake NN calculator was 'not really performing algebra', then Bob would be well within his rights to ask Alice what on earth she was going on about.
> If Alice had concluded that this occasional mistake NN calculator was 'not really performing algebra', then Bob would be well within his rights to ask Alice what on earth she was going on about.
No, your burden of proof here is totally bass-ackwards.
Bob's the one who asked for blind trust that his magical auto-learning black-box would be made to adhere to certain rules... but the rules and trust are broken. Bob's the one who has to start explaining the discrepancy, and whether the failure is (A) a fixable bug or (B) an unfixable limitation that can be reliably managed or (C) an unfixable problem with no good mitigation.
> It's not irrelevant, because this is an argument about whether the machine can be said to be reasoning or not.
Bringing up "b-b-but homo sapiens" is only "relevant" if you're equivocating the meaning of "reasoning", using it in a broad, philosophical, and kinda-unprovable sense.
In contrast, the "reasoning" we actually wish LLMs would do involves capabilities like algebra, syllogisms, deduction, and the CS-classic boolean satisfiability.
However the track-record of LLMs on such things is long and clear: They fake it, albeit impressively.
The LLM will finish the popular 2+2=_, and we're amazed, but when we twiddle the operands too far, it gives nonsense. It answers "All men are mortal. Socrates is a man. Therefore, Socrates is ______", but reword the situation enough and it breaks again.
>Bob's the one who asked for blind trust that his magical auto-learning black-box would be made to adhere to certain rules... but the rules and trust are broken.
This is the problem with analogies. Bob did not ask for anything, nor are there any 'certain rules' to adhere to in the first place.
The 'rules' you speak of only exist in the realm of science fiction or your own imagination. Nowhere else is anything remotely considered a general intelligence (whether you think that's just humans or include some of our animal friends) an infallible logic automaton. It literally does not exist. Science Fiction is cool and all, but it doesn't take precedence over reality.
>Bringing up "b-b-but homo sapiens" is only "relevant" if you're equivocating the meaning of "reasoning", using it in a broad, philosophical, and kinda-unprovable sense.
You mean the only sense that actually exists ? Yes. It's also not 'unprovable' in the sense I'm asking about. Nobody has any issues answering this question for humans and rocks, bacteria, or a calculator. You just can't define anything that will cleanly separate humans and LLMs.
>In contrast, the "reasoning" we actually wish LLMs would do involves capabilities like algebra, syllogisms, deduction, and the CS-classic boolean satisfiability.
Yeah, and they're capable of doing all of those things. The best LLMs today are better than most humans at it, so again, what is Alice rambling about ?
>The LLM will finish the popular 2+2=_, and we're amazed, but when we twiddle the operands too far, it gives nonsense.
Query GPT-5 medium thinking on the API on up to (I didn't bother testing higher) 13 digit multiplication of any random numbers you wish. Then watch it get it exactly right.
Weeks ago, I got Gemini 2.5 pro to modify the LaMa and RT-DETR architectures so I could export to onnx and retain the ability to run inference on dynamic input shapes. This was not a trivial exercise.
>It answers "All men are mortal. Socrates is a man. Therefore, Socrates is ______", but reword the situation enough and it breaks again.
Do you actual have an example of a reword SOTA models fail at ?
> Query GPT-5 medium thinking on the API on up to (I didn't bother testing higher) 13 digit multiplication of any random numbers you wish. Then watch it get it exactly right.
I'm not sure if "on the API" here means "the LLM and nothing else." This is important because it's easy to overestimate the algorithm when you give it credit for work it didn't actually do.
In general, human developers have taken steps to make the LLM transcribe the text you entered into a classically-made program, such as a calculator app, python, or Wolfram Alpha. Without that, the LLM would have to use its (admittedly strong) powers of probabilistic fakery [0].
Why does it matter? Suppose I claimed I had taught a chicken to do square roots. Suspicious, you peer behind the curtain, and find that the chicken was trained to see symbols on a big screen and peck the matching keys on pocket calculator. Wouldn't you call me a fraud for that?
_____________
Returning to the core argument:
1. "Reasoning" that includes algebra, syllogisms, deduction, etc. involves certain processes for reaching an answer. Getting a "good" answer through another route (like an informed guess) is not equivalent.
2. If an algorithm cannot do the algebra process, it is highly unlikely that it can do the others.
3. If an algorithm has been caught faking the algebra process through other means, any "good" results for other forms of logic should be considered inherently suspect.
4. LLMs are one of the algorithms in points 2 and 3.
_____________
[0] https://www.mindprison.cc/p/why-llms-dont-ask-for-calculator...
>I'm not sure if "on the API" here means "the LLM and nothing else." This is important because it's easy to overestimate the algorithm when you give it credit for work it didn't actually do.
That's what I mean yes. There is no tool use for I what I mentioned.
>1. "Reasoning" that includes algebra, syllogisms, deduction, etc. involves certain processes for reaching an answer. Getting a "good" answer through another route (like an informed guess) is not equivalent.
Again if you cannot confirm that these 'certain processes' are present when humans do it but not when LLMs do it then your 'processes' might as well be made up.
And unless you concede humans are also not performing 'true algebra' or 'true reasoning', then your position is not even logically consistent. You can't eat your cake and have it.
No. I see AI people use this reasoning all the time and it's deeply misleading.
"You can't explain how humans do it, therefore you can't prove my statistical model doesn't do it" is kinda just the god of the gaps fallacy.
It abuses the fact that we don't understand how human cognition works, and therefore it's impossible to come up with a precise technical description. Of course you're going to win the argument, if you insist the other party do something currently impossible before you will accept their idea.
It's perfectly fine to use a heuristic for reasoning, as the other person did. LLMs don't reason by any reasonable heuristic.
>No. I see AI people use this reasoning all the time and it's deeply misleading. "You can't explain how humans do it, therefore you can't prove my statistical model doesn't do it" is kinda just the god of the gaps fallacy.
No, this is 'stop making claims you cannot actually support'.
>It abuses the fact that we don't understand how human cognition works, and therefore it's impossible to come up with a precise technical description.
Are you hearing yourself ? If you don't understand how human cognition works then any claims what is and isn't cognition should be taken with less than a grain of salt. You're in no position to be making such strong claims.
If you go ahead and make such claims, then you can be hardly surprised if people refuse to listen to you.
And by the way, we don't understand the internals of Large Neural Networks much better than human cognition.
>It's perfectly fine to use a heuristic for reasoning
You can use whatever heuristic you want and I can rightly tell you it holds no more weight than fiction.
It's the same bitching every time an LLM post can be responded to. ITS NOT THINKING!!! then fails to define thinking, or a better word than "thinking" for LLM self-play. I consider these posts to be on par for quality with "FRIST!!!!!!" posts.
Idk I think saying it’s “computing” is more precise because “thinking” applies to meatbags. It’s emulating thinking.
Really I just think that anthropomorphizing LLMs is a dangerous road in many ways and really it’s mostly marketing BS anyway.
I haven’t seen anything that shows evidence of LLMs being anything beyond a very sophisticated computer system.
Do submarines swim? Thinking is something that doesn’t happen inside a machine. Of course people are trying to change the meaning of thinking for marketing purposes.
Ironically, in the UUV space, they use the term “flying” when talking about controlling UUVs.