VFX artist and developer here, who's deep into this stuff, and it is really not there. It's an island of itself, barely controllable and barely usable with other media. They are just now getting around to generating alpha channels, with virtual none of the existing pipelines for any AI video or image generation tools to even incorporate and work with alpha channels. This is just one of several hundred aspects of incompatibility. It really seriously appears as of no one at any of the AI video generation research teams has any professional media production experience, or even bothered too look at existing media production data standards, and what they are making tool-wise is incompatible in every possible respect.
"It really seriously appears as of no one at any of the AI video generation research teams has any professional media production experience, or even bothered too look at existing media production data standards,"
I had to chuckle at this. Because the arrogance of OAI et al will finally get them in the end when these projects continue to be negative NPV.
Honest q - do you think these things will make a big difference if these videos can be made in 15 minutes for $20 or whatever?
Won’t the industry change to adopt that massive price cut/productivity gain?
The cost is and will be more than that, the time will be more, and I really think people are underestimating the time it takes to create good stories. Sure, there will be online locations to make short form video of all kinds. People have had video cameras in their pockets for a very long time and being hobby film makers are not really popular. The AI video sites now are 95% people fascinated with the ability to make video at all, and after a bit their interest dies because to actually make anything that requires real work even with AI helping left and right. Consistency is a harsh mistress; and AI video is only good with it for a short duration. So any narrative that makes a story worth watching, it's not AI slop, will continue to require humans and human creativity - for the consistency that gives a story the integrity that makes it worth watching. At least for audiences that care. No doubt, there are commercial forces working to develop audiences that like and prefer AI slop.
Do you even see a path from the current AI systems to something that has that near-total control over every detail that is required for high quality VFX work?
Yes. Adding alpha channels would be step one. Then perhaps incorporate the "element" concept that is basically any identifiable visual anything; which is what VFX uses as a composite-capable element. Then build a whole visual scene description prose that is what we give to a video AI, and that prose is high level language where necessary and element-wise specific where necessary. Base that scene description prose on the language used by film makers directly, just adopt their terminology, and then track the industry's jargon within the models. That way anyone working in media will auto-magically know how to control them.
We are at a point now where it is now how to write software that is the problem but how to describe to the software that is the problem. Video and film making is so generalized, AI needs more information. Typically that information comes from a director's and their team's consistency during production. AI has neither the information for consistency of imagery nor the narrative and the perspective of the narrative a human director and team bring. In time, AI will develop large enough contexts, but will the hardware to run that be affordable? There is a huge amount of context in both an entire script and the world view perspective a film crew brings to any script, and for that reason I think many of the traditional (VFX included) film roles are not going to suddenly disappear. AI video does not replace their consistency at their budget, hands down.
When AI video is able to be just a part of the skill set, for example when it is compatible with compositing, editing, and knows that terminology, AI video will be adopted more. Right now, it is designed as an all or nothing offering.