I was a bit disappointed, even know there is no reason I should expect much in this space

- Tennis clip => ball is strongly unsynced with hit

- Dark mood beach video, no one in the screen => very high audio mood, lots of laughter like if it was summer on a busy beach

- Music inpainting completely switching style of audio (e.g. on the siren)

- "Electronic music with some buildup" : the gen just turns the volume up ?

I guess we have still some road to cover, but it feels like early image generation with out of touch hands and visual features. At least the generation are not non-sensical at all