> We really have no idea how did ability to have a conversation emerge from predicting the next token.
Uh yes, we do. It works in precisely the same way that you can walk from "here" to "there" by taking a step towards "there", and then repeating. The cognitive dissonance comes when we conflate this way of "having a conversation" (two people converse) and assume that the fact that they produce similar outputs means that they must be "doing the same thing" and it's hard to see how LLMs could be doing this.
Sometimes things seems unbelievable simply because they aren't true.
> It works in precisely the same way that you can walk from "here" to "there" by taking a step towards "there", and then repeating.
It's funny how, in order to explain one complex phenomenon, you took an even more complex phenomenon as if it somehow simplifies it.
Sorry, can't tell if that's sarcasm or not.
I wasn't referring to the biomechanical process of walking, I was referring to the process of gradient descent, which is well understood and yes, quite simple.