A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations[...]it'd be enough is the speech if easy to recognize.

We've had formant synths for several decades, and they're perfectly understandable and require a tiny amount of computing power, but people tend not to want to listen to them:

https://en.wikipedia.org/wiki/Software_Automatic_Mouth

https://simulationcorner.net/index.php?page=sam (try it yourself to hear what it sounds like)

SAM and the way it works is not what people typically associate with the term "formant synthesizer."

DECtalk[1,2] would be a much better example, that's as formant as you get.

[1] https://en.wikipedia.org/wiki/DECtalk [2] https://webspeak.terminal.ink

Well, this one is a bit too jarring to the ears.

But there is no latency, as opposed to KittenTTS, so it certainly has its applications too.

Try this demo, which has more knobs:

https://discordier.github.io/sam/

I think it's charming

Huh, now I know what Airdorf used in Faith: Unholy Trinity.

Yeah blind people love eloquence