I'm not sure if it's has any real bearing on real-world performance, but technically next token prediction makes it an online algorithm and they can be provably worse than (good) offline algorithms.
I'm not sure if it's has any real bearing on real-world performance, but technically next token prediction makes it an online algorithm and they can be provably worse than (good) offline algorithms.