I'm not sure if it's has any real bearing on real-world performance, but technically next token prediction makes it an online algorithm and they can be provably worse than (good) offline algorithms.