I was a bit disappointed, even know there is no reason I should expect much in this space
- Tennis clip => ball is strongly unsynced with hit
- Dark mood beach video, no one in the screen => very high audio mood, lots of laughter like if it was summer on a busy beach
- Music inpainting completely switching style of audio (e.g. on the siren)
- "Electronic music with some buildup" : the gen just turns the volume up ?
I guess we have still some road to cover, but it feels like early image generation with out of touch hands and visual features. At least the generation are not non-sensical at all