Impressive technical achievement, but in terms of whether I'd use it: oof, that male voice is like one of these fake-excited newsreaders. Like they're always at the edge of their breath. The female one is better but still someone reading out an advertisement for a product they were told they must act extra excited for. I assume this is what the majority of training data was like and not an intentional setting for the demo. Unsure whether I could get used to that
I use TTS on my phone regularly and recently also tried this new project on F-Droid called SherpaTTS, which grabs some models from Huggingface. They're super heavy (the phone suspends other apps to disk while this runs) and sound good, but in the first news article there were already one or two mispronunciations because it's guessing how to say uncommon or new words and it's not based on logical rules anymore to turn text into speech
Google and Samsung have each a TTS engine pre-installed on my device and those sound and work fine. A tad monotonous but it seems to always pronounce things the same way so you can always work out what the text said
Espeak (or -ng) is the absolute worst, but after 30 seconds of listening closely you get used to it and can understand everything fine. I don't know if it's the best open source option (probably there are others that I should be trying) but it's at least the most reliable where you'll always get what is happening and you can install it on any device without licensing issues
anyone else wants to try sherpaOnnx you can try this.. https://github.com/willwade/tts-wrapper we recently added in the kokoro models which should sound a lot better. There are a LOT of models to choose from. I have a feeling the Droid app isnt handling cold starts very well.
If anyone wants to test ready to install android apks: https://k2-fsa.github.io/sherpa/onnx/tts/apk.html
Thanks a lot for the detailed feedback. We are working on some models which do not use a phonemizer
RHvoice is pretty good, imho.