Did they make human voices sound robotic on purpose? Is that some kind of Ai fingerprinting? It's way too obvious

It's very hard for simultaneous good audio generation with video generation (simultaneous generation is necessary to maintain lip sync). Veo 3 et al also have flat monochannel audio, but not as bad as these Sora 2 demos.