Ah yes, the classic “because it can” argument. I’ll take that to mean you don’t know what you’re talking about.

It seems you are confusing this with a personal opinion. This is not my opinion. This is merely the consensus of current research.

See here for example:

[1] https://arxiv.org/pdf/2410.18072

[2] https://arxiv.org/pdf/2411.02914v1

[3] https://openai.com/index/video-generation-models-as-world-si...

But even if you knew nothing about this topic, the observation that you simply couldn't store the necessary amount of video data in a model such that it could simply regurgitate it should give you a big clue as to what is happening.