Not sure how you could read this essay and come to that conclusion. It definitely aligns with my own understanding, and his conclusions seem pretty reasonable (though the AI 2027/Situational Awareness part might be arguable)
Not sure how you could read this essay and come to that conclusion. It definitely aligns with my own understanding, and his conclusions seem pretty reasonable (though the AI 2027/Situational Awareness part might be arguable)
Absolutely:
> In order to predict where thinking and reasoning capabilities are going, it's important to understand the trail of thought that went into today's thinking LLMs.
No. You don't understand at all. They don't think. They don't reason. They are statistical word generators. They are very impressive at doing things like writing code, but they don't work the way that is being inferred here.
This is an outdated view.
People get the idea that just because LLMs are initially trained to predict word sequences, that's all they can do. This is not the case.
Transformers are general-purpose learning mechanisms. Word sequences are just the first thing we teach them. Then we train them more with human feedback. Then we sometimes hook them up to math and logic engines to train them some more, so they learn logical and mathematical reasoning. The article describes this process in a bit of detail.
With "reasoning models" we also let the model have some internal monologue before generating output, so it can "think through" the problem.
We didn't do all that with early LLMs. Those were just word predictors. But now we do those things, and that's why our lowly LLMs are writing huge software projects that actually work, and solving famous math problems that have been open challenges for half a century.
The weirdest part of all this is that LLMs started showing signs of reasoning with an internal world model even before we trained them for it specifically. Microsoft showed this in a famous paper back in (iirc) 2023. They showed that, for example, you could give GPT4 a list of odd-shaped objects that probably wouldn't appear together in any particular source text, ask GPT4 how to stack them so they wouldn't fall over, and GPT4 would come up with a good solution.
Finally, don't overlook the multimodal models, which work explicitly with images, video, and 3D world models in addition to text.