Did they make human voices sound robotic on purpose? Is that some kind of Ai fingerprinting? It's way too obvious
It's very hard for simultaneous good audio generation with video generation (simultaneous generation is necessary to maintain lip sync). Veo 3 et al also have flat monochannel audio, but not as bad as these Sora 2 demos.
It's very hard for simultaneous good audio generation with video generation (simultaneous generation is necessary to maintain lip sync). Veo 3 et al also have flat monochannel audio, but not as bad as these Sora 2 demos.