You're right that world models are the bottleneck, but people underestimate the staggering complexity gap between modeling the physical world and modeling a one-dimensional stream of text. Not only is the real world high-dimensional, continuous, noisy, and vastly more information dense, it's also not something for which there is an abundance of training data.