So we've basically taken the concept of branch prediction from CPUs and applied it to LLMs?
Maybe at very high level of abstraction, but there's no branching involved.
Well, there are multiple token proposals processed in parallel, from which only one is picked, seems like branching to me. The only difference is that in case of CPU there is always only one possible branch that is correct.
Well, not exactly, but that was the dream we were sold (here be dragons)
The concept of predicting future elements in a series is not specific to CS. It's older than computers.
Well, the TPUs they're running on don't have branch prediction, so that had to end up somewhere in the stack.
Maybe at very high level of abstraction, but there's no branching involved.
Well, there are multiple token proposals processed in parallel, from which only one is picked, seems like branching to me. The only difference is that in case of CPU there is always only one possible branch that is correct.
Well, not exactly, but that was the dream we were sold (here be dragons)
The concept of predicting future elements in a series is not specific to CS. It's older than computers.
Well, the TPUs they're running on don't have branch prediction, so that had to end up somewhere in the stack.