> I find this dubious
I agree. In both cases a continuously varying voltage is driving speaker cone deflection. If the voltages of two different signals vary in precisely the same way, the cone will deflect to exactly the same degree and the resulting pressure wave will generate the same resonant response from any surface it encounters. When properly implemented, today's high-end, esoteric ADC and DAC converters have insane bandwidth, frequency response and fidelity far exceeding these requirements.
Some of the confusion comes from the fact that back when consumer audio transitioned to digital and these production workflows were new, some early digital recordings were incorrectly engineered or mastered creating artifacts such as aliasing which critical listeners could hear. Some people assumed the artifacts they heard were innate to all digital audio instead of just incorrect implementation of a new technology. Even today, it's possible to screw up the fidelity of a digital master but it's rarely an issue because workflows are standardized and modern tooling has default presets based on well-validated audio science (for example: https://en.wikipedia.org/wiki/Noise_shaping#Dithering). But even in the analog era it was always a truism in audio and video engineering that "there are infinite ways to screw up a signal but only a few ways to preserve it." And it remains true today. To me, one of the best things about modern digital tooling is it's much easier to verify correctness in the signal chain.