The toilet flushing one is full of weird, unrelated noises.
The tennis video, as other commented, is good but there is a noticeable delay between the action and the sound. And the "loving couple holding IA hands and then dancing", well, the input is already cringe enough.
For all these diffusion models, look like we are 90% here, now we just need the final 90%.