given the architecture, is there a way to force the use of specific phonemes for hard-to-pronounce words? If so that's big
Yes. Specifically, the pipeline is text -> phonemizer -> phonemized text -> TTS model -> audio You just have to modify the phonemizer's dictionary.
Yes. Specifically, the pipeline is text -> phonemizer -> phonemized text -> TTS model -> audio You just have to modify the phonemizer's dictionary.