I opened a couple of PRs to fix this situation:
https://github.com/KittenML/KittenTTS/pull/21 https://github.com/KittenML/KittenTTS/pull/24 https://github.com/KittenML/KittenTTS/pull/25
If you have `uv` installed, you can try my merged ref that has all of these PRs (and #22, a fix for short generation being trimmed unnecessarily) with
uvx --from git+https://github.com/akx/KittenTTS.git@pr-21-22-24-25 kittentts --output output.wav --text "This high quality TTS model works without a GPU"
Thanks for the quick intro into UV, it looks like docker layers for python
I found the TTS a bit slow so I piped the output into ffplay with 1.2x speedup to make it sound a bit better
Ah, yeah, good catch – I added the model-native speed multiplier to the CLI too (`--speed=1.2` for instance).
https://github.com/KittenML/KittenTTS/pull/21/commits/0aacfc...
Nice one, thanks!