Hacker News

In a way, the main problem with LLMs isn't that they are wrong sometimes. We humans are used to that. We encounter people who are professionally wrong all the time. Politicians, con-men, scammers, even people who are just honestly wrong. We have evaluation metrics for those things. Those metrics are flawed because there are humans on the other end intelligently gaming those too, but generally speaking we're all at least trying.

LLMs don't fit those signals properly. They always sound like an intelligent person who knows what they are talking about, even when spewing absolute garbage. Even very intelligent people, even very intelligent people in the field of AI research are routinely bamboozled by the sheer swaggering confidence these models convey in their own results.

My personal opinion is that any AI researcher who was shocked by the paper lynguist mentioned ought to be ashamed of themselves and their credulity. That was all obvious to me; I couldn't have told you the exact mechanism the arithmetic was being performed (though what is was doing was well in the realm of what I would have expected from a linguistic AI trying to do math), but the fact that its chain of reasoning bore no particular resemblance to how it drew its conclusions was always obvious. A neural net has no introspection on itself. It doesn't have any idea "why" it is doing what it is doing. It can't. There's no mechanism for that to even exist. We humans are not directly introspecting our own neural nets, we're building models of our own behavior and then consulting the models, and anyone with any practice doing that should be well aware of how those models can still completely fail to predict reality!

Does that mean the chain of reasoning is "false"? How do we account for it improving performance on certain tasks then? No. It means that it is occurring at a higher level and a different level. It is quite like humans imputing reasons to their gut impulses. With training, combining gut impulses with careful reasoning is actually a very, very potent way to solve problems. The reasoning system needs training or it flies around like an unconstrained fire hose uncontrollably spraying everything around, but brought under control it is the most powerful system we know. But the models should always have been read as providing a rationalization rather than an explanation of something they couldn't possibly have been explaining. I'm also not convinced the models have that "training" either, nor is it obvious to me how to give it to them.

(You can't just prompt it into a human, it's going to be more complicated than just telling a model to "be carefully rational". Intensive and careful RHLF is a bare minimum, but finding humans who can get it right will itself be a challenge, and it's possible that what we're looking for simply doesn't exist in the bias-set of the LLM technology, which is my base case at this point.)