Hacker News

Web version: https://clowerweb.github.io/kitten-tts-web-demo/

It sounds ok, but impressive for the size.

Does anybody find it funny that sci-fi movies have to heavily distort "robot voices" to make them sound "convincingly robotic"? A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations. I don't expect a smart toaster to talk like a BBC host; it'd be enough is the speech if easy to recognize.

userbinator 2 days ago [ - ]

A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations[...]it'd be enough is the speech if easy to recognize.

We've had formant synths for several decades, and they're perfectly understandable and require a tiny amount of computing power, but people tend not to want to listen to them:

https://en.wikipedia.org/wiki/Software_Automatic_Mouth

https://simulationcorner.net/index.php?page=sam (try it yourself to hear what it sounds like)

miki123211 2 days ago [ - ]

SAM and the way it works is not what people typically associate with the term "formant synthesizer."

DECtalk[1,2] would be a much better example, that's as formant as you get.

[1] https://en.wikipedia.org/wiki/DECtalk [2] https://webspeak.terminal.ink

saretup 2 days ago [ - ]

Well, this one is a bit too jarring to the ears.

rixed 2 days ago [ - ]

But there is no latency, as opposed to KittenTTS, so it certainly has its applications too.

cess11 2 days ago [ - ]

Try this demo, which has more knobs:

https://discordier.github.io/sam/

actionfromafar 2 days ago [ - ]

I think it's charming

boobsbr 2 days ago [ - ]

Huh, now I know what Airdorf used in Faith: Unholy Trinity.

tapper 2 days ago [ - ]

Yeah blind people love eloquence

roywiggins 2 days ago [ - ]

This one is at least an interesting idea: https://genderlessvoice.com/

cosmojg 2 days ago [ - ]

The voice sounds great! I find it quite aesthetically pleasing, but it's far from genderless.

a96 28 minutes ago [ - ]

So, what's the gender?

dang 2 days ago [ - ]

Meet Q, a Genderless Voice - https://news.ycombinator.com/item?id=19505835 - March 2019 (235 comments)

degamad 2 days ago [ - ]

Interesting concept, but why is that site filled with Top X blogspam?

pbronez 2 days ago [ - ]

The YouTube video [1] was published in 2019. The Blog spam posts range from Nov 2022 to July 2023.

Other than the video, the only relevant content is on the about page [2]. It says the voice is a collaboration between 5 different entities, including advocacy groups, marketing firms and a music producer.

The video is the only example of the voice in use. There is no API, weights, SDK, etc.

I suspect this was a one-off marketing stunt sponsored by Copenhagen pride before the pandemic. The initial reaction was strong enough that a couple years they were still getting a small but steady flow of traffic. One of the involved marketing firms decided to monetize the asset and defaced it with blog spam.

[1] https://www.youtube.com/watch?v=lvv6zYOQqm0

[2] https://genderlessvoice.com/about/

cyberax 2 days ago [ - ]

It doesn't sound genderless.

qmr 11 hours ago [ - ]

Thanks, I hate it.

pbronez 2 days ago [ - ]

Huh. Sounds perfectly intelligible and definitively artificial. Feels weakly feminine to me, but only because I was primed to think about gender from the branding.

It’s a good choice for a robot voice. It’s easier to understand than the formant synths or deliberately distorted human voices. The genderless aspect is alien enough to avoid the uncanny valley. You intuitively know you’re dealing with something a little different.

mfro 2 days ago [ - ]

In the Culture novels, Iain Banks imagines that we would become uncomfortable with the uncanny realism of transmitted voices / holograms, and intentionally include some level of distortion to indicate you're speaking to an image

incone123 2 days ago [ - ]

Depends on the movie. Ash and Bishop in the Alien franchise sound human until there's a dramatic reason to sound more 'robotic'.

I agree with your wider point. I use Google TTS with Moon+Reader all the time (I tried audio books read by real humans but I prefer the consistency of TTS)

regularfry 2 days ago [ - ]

Slightly different there because it's important in both cases that Ripley (and we) can't tell they're androids until it's explicitly uncovered. The whole point is that they're not presented as artificial. Same in Blade Runner: "more human than human". You don't have a film without the ambiguity there.

incone123 2 days ago [ - ]

You're right. I should have used Marvin from Hitchhiker's Guide as an example instead. There's very light processing on his speech.

Twirrim 2 days ago [ - ]

> I don't expect a smart toaster to talk like a BBC host;

Well sure, the BBC have already established that it's supposed to sound like a brit doing an impersonation of an American: https://www.youtube.com/watch?v=LRq_SAuQDec

looperhacks 2 days ago [ - ]

I remember that the novelization of the fifth element describes that the cops are taught to speak as robotic as possible when using speakers for some reason. Always found the idea weird that someone would _want_ that

addandsubtract 2 days ago [ - ]

If you're on a Mac, you can type "say [thing to say]" into your terminal.

msgodel 2 days ago [ - ]

I personally prefer the older synthetic voices for TTS when the text is coming from software or a language model.

bkyan 2 days ago [ - ]

I got an error when I tried the demo with 6 sentences, but it worked great when I reduced the text to 3 sentences. Is the length limit due to the model or just a limitation for the demo?

divamgupta 2 days ago [ - ]

Currently we don't have chunking enabled yet. We will add it soon. That will remove the length limitations.

cess11 2 days ago [ - ]

Perhaps a length limit? I tried this:

"This first Book proposes, first in brief, the whole Subject, Mans disobedience, and the loss thereupon of Paradise wherein he was plac't: Then touches the prime cause of his fall, the Serpent, or rather Satan in the Serpent; who revolting from God, and drawing to his side many Legions of Angels, was by the command of God driven out of Heaven with all his Crew into the great Deep."

It takes a while until it starts generating sound on my i7 cores but it kind of works.

This also works:

"blah. bleh. blih. bloh. blyh. bluh."

So I don't think it's a limit on punctuation. Voice quality is quite bad though, not as far from the old school C64 SAM (https://discordier.github.io/sam/) of the eighties as I expected.

2 days ago [ - ]

[deleted]

Retr0id 2 days ago [ - ]

I tried to replicate their demo text but it doesn't sound as good for some reason.

If anyone else wants to try:

> Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications. Our smallest model is less than 25 megabytes.

cortesoft 2 days ago [ - ]

Is the demo using the not smallest model?

Retr0id 2 days ago [ - ]

Perhaps, but the 25MB model is the only thing they've released

quantummagic 2 days ago [ - ]

Doesn't work here. Backend module returns 404 :

https://clowerweb.github.io/node_modules/onnxruntime-web/dis...

Retr0id 2 days ago [ - ]

Looks like this commit 15 minutes ago broke it https://github.com/clowerweb/kitten-tts-web-demo/commit/6b5c...

(seems reverted now)

itake 2 days ago [ - ]

> Error generating speech: failed to call OrtRun(). ERROR_CODE: 2, ERROR_MESSAGE: Non-zero status code returned while running Expand node. Name:'/bert/Expand' Status Message: invalid expand shape

Doesn't seem to work with thai.

jainilprajapati 2 days ago [ - ]

You can also try on https://clowerweb.github.io/node_modules/onnxruntime-web/dis...

nxnsxnbx 2 days ago [ - ]

Thanks, I was looking for that. While the reddit demo sounds ok, even though on a level we reached a couple of years ago, all TTS samples I tried were barley understandable at all

divamgupta 2 days ago [ - ]

This is just an early checkpoint. We hope that the quality will improve in the future.

Aardwolf 2 days ago [ - ]

On PC it's a python dependency hell but someone managed to package it in self contained JS code that works offline once it loaded the model? How is that done?

a2128 2 days ago [ - ]

ONNXRuntime makes it fairly easy, you just need to provide a path to the ONNX file, give it inputs in the correct format, and use the outputs. The ONNXRuntime library handles the rest. You can see this in the main.js file: https://github.com/clowerweb/kitten-tts-web-demo/blob/main/m...

Plus, Python software are dependency hell in general, while webpages have to be self-contained by their nature (thank god we no longer have Silverlight and Java applets...)

Jotalea 20 hours ago [ - ]

Using male voice 2 at 48kHz at 0.5x speed sounds a lot like Madeline's voice lines in Celeste. Seemed funny to me.

scotty79 2 days ago [ - ]

It feels like it doesn't handle punctuation well. I don't hear sentence boundaries and commas. It sounds like continuous stream of words.

rohan_joshi 2 days ago [ - ]

yeah, this is just a preview model from an early checkpoint. the full model release will be next week which includes a 15M model and an 80M model, both of which will have much higher quality than this preview.

belchiorb 2 days ago [ - ]

This doesn’t seem to work on Safari. Works great on Chrome, though

divamgupta 2 days ago [ - ]

Hmm, we will look into it.

tapper 2 days ago [ - ]

You should post on the NVDA email list. https://nvda.groups.io/g/nvda Or the Screen reader list: https://winaccess.groups.io/g/winaccess FYI blind people do not like any lag when reading that’s is why so many still use eloquence and espeak.

kenarsa 2 days ago [ - ]

[flagged]

gary_0 2 days ago [ - ]

Not open source. "You will need internet connectivity to validate your AccessKey with Picovoice license servers ... If you wish to increase your limits, you can purchase a subscription plan." https://github.com/Picovoice/orca#accesskey

papichulo2023 2 days ago [ - ]

The guy is just spamming the project in a lot of comments.

cakealert 2 days ago [ - ]

Going online is a dealbreaker but if you really need it you could use ghidra to fix that. I had tried to find a conversion of their model to onnx (making their proprietary pipeline useless) but failed.

Hopefully open source will render them irrelevant in the future.

satvikpendem 2 days ago [ - ]

Does an apk for Android exist for replacing its speech to text engine? I tried sherpa-onnx but it was too slow for real time usage it seemed, and especially so for audiobooks when sped up.

kenarsa 2 days ago [ - ]

[flagged]

satvikpendem 2 days ago [ - ]

I can't test this out right now, is this just a demo or is it actually an apk for replacing the engine? Because those are two different things, the latter can be used any time you want to read something aloud on the page for example. This is the sherpa-onnx one I'm talking about.

https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html