The trouble is the lack of training available to these models compared to the ones like Seedance and Kling who seems to be tapping into their unlimited video inventory. Many models like LTX is technically good but when it comes to slightly different camera movements or the subject interacting with objects they struggle. For a recent example we had to use sample videos generated by closed source models and then use the same for final video.
I tend to think of these NV Labs models as architectural demos and ‘free razor blades’ — they’re more intended to inform internal R&D, get customers something that lets them do what they want quickly, and enhance the state of the art.
In this case, what looks interesting is the one minute coherence and the massive speedup - they claim 36x over open models with similar capabilities. You can tell they aren’t aiming for state of the art visuals — looks very SD 1.5 in terms of the output quality.
Agreed the marketing angle. But beyond the marketing angle what seems to matter is the access to data - look at Seedance , various Kling models etc which are far ahead of others.
It’s hard not to believe that Google doesn’t have an amazing model in-house with all that Youtube content available. But, agreed the Chinese models seem best in the last year or so, and agreed an open policy on training data def makes for better quality