Hacker News

I'm curious why smallish TTS models have metallic voice quality.

The pronunciation sounds about right - i thought it's the hard part. And the model does it well. But voice timbre should be simpler to fix? Like, a simple FIR might improve it?

codedokode 2 days ago [ - ]

Probably "metallicity" is due to lack of details and cannot be fixed that easy.

nickpsecurity 2 days ago [ - ]

We change our tone based on personal style, emotion, context, and other factors. An accurate generator might need to encode all that information in the model. It will be larger than a model that doesn't do all of that.