The most interesting thing by far is the ability to include video clips of people and products as a part of the prompt and then create a realistic video with that metadata. On the technical side, I'm guessing they've just trained the model to conditionally generate videos based on predetermined characters -- it's likely more of a data innovation than anything architectural. However, as a user, the feature is very cool and will likely make Sora 2 very useful commercially.
However, I still don't see how OpenAI beats Google in video generation. As this was likely a data innovation, Google can replicate and improve this with their ownership of YouTube. I'd be surprised if they didn't already have something like this internally.
> the ability to include video clips of people and products as a part of the prompt and then create a realistic video with that
This is something I would not like to see, I prefer product videos to be real, I am taking a risk with my money. If the product has hallucinated or unrealistic depiction it would be a kind of fraud.
Deepfakes require zero work now
I believe existing laws already cover that issue.