I'm working on a voice cloning version of my TTS model, a highly upgraded VITS:
https://x.com/ZDi____/status/2013655958027669958
Right now, I only have single speaker checkpoints (as per the old video). That will change soon.
I'm working on a voice cloning version of my TTS model, a highly upgraded VITS:
https://x.com/ZDi____/status/2013655958027669958
Right now, I only have single speaker checkpoints (as per the old video). That will change soon.
VITS is such a cool model (and paper), fast, minimal, trainable. Meta took it to extreme for about 1000 languges.
It seems like you have been working on this application for sometime, i will go through your code , but could you provide some context about upgradations/changes you have made, or some post describing your efforts.
Cool nonetheless!
Recommendations for local text-to-speech synth? Last year, played with Piper-TTS, Chatterbox, and some others. Ideally supporting English, Spanish, Chinese.
Multilingual and local? Try out Supertonic 2.