yes those are bottlenecks that world models don't solve. but the promise of world models is, unlike LLMs, they might be able to learn things about the world that humans haven't written. For example, we still don't fully know how insects fly. A world model could be trained on thousands of videos of insects and make a novel observation about insect trajectories. The premise is that despite being here for millenia, humans have only observed a tiny fraction of the world.

So I do buy his idea. But I disagree that you need world models to get to human level capabilities. IMO there's no fundamental reason why models can't develop human understanding based on the known human observations.