There were a lot of shortcomings.

- The woman playing what I think was an Erhu[1] seemed to be imitating traditional music played by that instrument, but really badly (it sounded much more like a human voice than the actual instrument does). Also, I'm not even sure if it was able to tell which instrument it was, or if it was picking up on other cues from the video (which could be problematic, e.g. if it profiles people based on their race and attire)

- Most of the sound was pretty delayed from the visual cues. Not sure why

- The nature sounds were pretty muddy

- (I realize this is from video to music, but) the video with pumping upbeat music set to the text "Maddox White witnessed his father getting butchered by the Capo of the Italian mob" was almost comically out of touch with the source

Nevertheless, it's an interesting demo and highlights more applications for AI which I'm expecting we'll see massive improvements in over the next few years! So despite the shortcomings I agree it's still quite impressive.

[1] https://en.wikipedia.org/wiki/Erhu