The article covers that.

In short, audio and visual perception do not map perfectly. Humans don't have a linear perception of either so a perfect A to D then D to A conversion yields unsatisfying results.