> Inability to introspect your own word selections does not mean it’s meaningfully different from what an LLM does. There is plenty of evidence that humans do a lot of things that are not driven by conscious choice and we rationalize it after the fact.
This is correct and also completely irrelevant. I am describing what I experience, and describing how my experience seems very different to next token prediction. I therefore conclude that it's plausible that there is more involved than something which can be reduced to next token prediction.
> And how is that different? You are not so subtly implying that an LLM can’t consider an idea but you haven’t established this as fact. i.e. You are starting with the assumption that an LLM cannot possibly think and therefore cannot be intelligent, but this is just begging the question.
Language models can't think outside of producing tokens. There is nothing going on within an LLM when it's not producing tokens. The only thing it does is taking in tokens as input and producing a token probability distribution as output. It seems plausible that this is not enough for general intelligence.
> You cannot spin experience as intelligence.
Correct, but I can point out that the only generally intelligent beings we know of have these sorts of experiences. Given that we know next to nothing about how a human's general intelligence works, it seems plausible that experience might play a part.
> LLMs have the experience of reading the entire internet, something you cannot conceive of.
I don't know that LLMs have an experience. But correct, I cannot conceive of what it feels like to have read and remembered the entire Internet. I am also a general intelligence and an LLM is not, so there's that.
> Certainly your experiences shape who you are. This is a different axis from intelligence, though.
I don't know enough about what makes up general intelligence to make this claim. I don't think you do either.
> Both sides of this claim seem dubious. The second half in particular seems to be founded on nothing. Again, you are asserting with no support that there is no thinking going on.
I'm telling you how these technologies work. When a language model isn't performing inference, it is not doing anything. A language model is a function which takes a token stream as input and produces a token probability distribution as output. By definition, there is no thinking outside of producing words. The function isn't running.
> I don’t think anyone sane is claiming an LLM can have a human experience. But it is not clear that a human experience is necessary for intelligence.
I 100% agree. It is not clear whether a human experience is necessary for intelligence. It is plausible that something approximating a human-like experience is necessary for intelligence. It is also plausible that something approximating human-like experience is completely unnecessary and you can make an AGI without such experiences.
It's plausible that next token prediction is sufficient for AGI. It's also plausible that it isn't.
> I don't know enough about what makes up general intelligence to make this claim. I don't think you do either.
This is the fundamental issue. No one seems capable of defining general intelligence. Ten years ago most scientists would probably have agreed that The Turing Test was sufficient but the goalposts shifted when ChatGPT passed that.
If it’s not clear what AGI even means, it’s hard to say whether an LLM can achieve it, because it devolves into pointing out that an LLM is not a human.
> Ten years ago most scientists would probably have agreed that The Turing Test was sufficient but the goalposts shifted when ChatGPT passed that.
The popularity of, and lack of consensus on, the Chinese room thought experiment kind of implies that this is wrong? I don't think many scientists (or, more relevantly, philosophers of mind) would, even 10 years ago, have said, "if a computer is able to fool a human into thinking it's a human, then the computer must possess a general intelligence".
Even Turing's perspective was, from what I understand, that we must avoid treating something that might be sentient as a machine. He proposed that if a computer is able to act convincingly human, we ought to treat it as if it is a human, not because it must be a conscious being but because it might be.
Perhaps I am wrong or overstating the belief that the Turing test would be sufficient. My recollection is that it was well regarded as a meaningful if not conclusive test.
> the Chinese room thought experiment
This is an interesting thought experiment but I think the “computers don’t understand” interpretation relies on magical thinking.
The notion that “systemic” understanding is not real is purely begging the question. It also ignores that a human is also a system.
I'm telling you how these technologies work. When a language model isn't performing inference, it is not doing anything. A language model is a function which takes a token stream as input and produces a token probability distribution as output. By definition, there is no thinking outside of producing words. The function isn't running.
If what you are saying is true, then LLMs wouldn't be able to handle out-of-distribution math problems without resorting to tool use. Yet they can. When you ask a current-generation model to multiply some 8-digit numbers, and forbid it from using tools or writing a script, it will almost certainly give you the right answer. That includes local models that can't possibly cheat. LLMs are stochastic, but they are not parrots.
At the risk of sounding like an LLM myself, whatever process makes this possible is not simply next-token prediction in the pejorative sense you're applying to it. It can't be. The tokens in a transformer network are evidently not just words in a Markov chain but a substrate for reasoning. The model is generalizing processes it learned, somehow, in the course of merely being trained to predict the next token.
Mechanically, yes, next-token prediction is what it's doing, but that turns out to be a much more powerful mechanism than it appeared at first. My position is that our brains likely employ similar mechanism(s), albeit through very different means.
It is scarcely believable that this abstraction process is limited to keeping track of intermediate results in math problems. The implications should give the stochastic-parrot crowd some serious cognitive dissonance, but...
(Edit: it occurs to me that you are really arguing that the continuous versus discrete nature of human thinking is what's important here. If so, that sounds like a motte-and-bailey thing that doesn't move the needle on the argument that originally kicked off the subthread.)
(Edit 2, again due to rate-limiting: it does sound like you've fallen back to a continuous-versus-discrete argument, and that's not something I've personally thought much about or read much about. I stand by my point that the ability to do arithmetic without external tools is sufficient to dispense with the stochastic-parrot school of thought, and that's all I set out to argue here.)
> If what you are saying is true, then LLMs wouldn't be able to handle out-of-distribution math problems without resorting to tool use. Yet they can. When you ask a current-generation model to multiply some 8-digit numbers, and forbid it from using tools or writing a script, it will almost certainly give you the right answer. That includes local models that can't possibly cheat. LLMs are stochastic, but they are not parrots.
Okay, what do you think language models are doing when they're not producing token probability distributions? What processes do you think are going on when the function which predicts a token isn't running?
> At the risk of sounding like an LLM myself, whatever process makes this possible is not simply next-token prediction in the pejoreative sense you're applying to it.
I don't know what pejorative sense you're implying here. I am, to the best of my ability, describing how the language model works. I genuinely believe that a language model is, in essence, a function which takes in a sequence of tokens and produces a token probability distribution as an output. If this is incorrect, please, correct me.
> Okay, what do you think language models are doing when they're not producing token probability distributions? What processes do you think are going on when the function which predicts a token isn't running?
What are you doing when you are not outputting tokens? You have a thought, evaluate it, refine it, repeat.
You’re not wrong that the basic building block is just “next token prediction”, but clearly the emergent behaviors exceed our intuition about what this process can achieve. We’re seeing novel proofs come out of these. Will this lead to AGI? That’s still TBD.
> I genuinely believe that a language model is, in essence, a function which takes in a sequence of tokens and produces a token probability distribution as an output. If this is incorrect, please, correct me.
The pejorative is that you imply this is a shallow and unthinking process. As I said earlier, you are literally a token generator on HN. You read someone’s comment, do some kind of processing, and output some tokens of your own.
> What are you doing when you are not outputting tokens? You have a thought, evaluate it, refine it, repeat.
I mean I do think sometimes even when not typing?
> Will this lead to AGI? That’s still TBD.
This is literally what I have been saying this whole time.
Since we agree, I will consider this conversation concluded.
He’s a time waster.
I bet the guy has never contributed a novel thought that could be argued as moving something of magnitude forward. If that is the case he ought to stop writing as if he were capable of doing so - and therefore has no understanding of what true intelligence is.