Hacker News

Anyone with access able to confirm if you can start this with a still image and a prompt?

The recent Google Veo 3 paper "Video models are zero-shot learners and reasoners" made a fascinating argument for video generation models as multi-purpose computer vision tools in the same way that LLMs are multi-purpose NLP tools. https://video-zero-shot.github.io/

It includes a bunch of interesting prompting examples in the appendix, it would be interesting to see how those work against Sora 2.

I wrote some notes on that paper here: https://simonwillison.net/2025/Sep/27/video-models-are-zero-...