I'll definitely believe that for video generation models, but those are also very compute-intensive for rather middling results.
I'll definitely believe that for video generation models, but those are also very compute-intensive for rather middling results.