Anyone with access able to confirm if you can start this with a still image and a prompt?
The recent Google Veo 3 paper "Video models are zero-shot learners and reasoners" made a fascinating argument for video generation models as multi-purpose computer vision tools in the same way that LLMs are multi-purpose NLP tools. https://video-zero-shot.github.io/
It includes a bunch of interesting prompting examples in the appendix, it would be interesting to see how those work against Sora 2.
I wrote some notes on that paper here: https://simonwillison.net/2025/Sep/27/video-models-are-zero-...
Yes, you can start with a still and a prompt