A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations[...]it'd be enough is the speech if easy to recognize.
We've had formant synths for several decades, and they're perfectly understandable and require a tiny amount of computing power, but people tend not to want to listen to them:
https://en.wikipedia.org/wiki/Software_Automatic_Mouth
https://simulationcorner.net/index.php?page=sam (try it yourself to hear what it sounds like)
SAM and the way it works is not what people typically associate with the term "formant synthesizer."
DECtalk[1,2] would be a much better example, that's as formant as you get.
[1] https://en.wikipedia.org/wiki/DECtalk [2] https://webspeak.terminal.ink
Well, this one is a bit too jarring to the ears.
But there is no latency, as opposed to KittenTTS, so it certainly has its applications too.
Try this demo, which has more knobs:
https://discordier.github.io/sam/
I think it's charming
Huh, now I know what Airdorf used in Faith: Unholy Trinity.
Yeah blind people love eloquence