Hacker News

klipklop 2 days ago [ - ]

I tried it. Not bad for the size (of the model) and speed. Once you install all the massive number of libraries and things needed we are a far cry away from 25MB though. Cool project nonetheless.

devnen 2 days ago [ - ]

That's a great point about the dependencies.

To make the setup easier and add a few features people are asking for here (like GPU support and long text handling), I built a self-hosted server for this model: https://github.com/devnen/Kitten-TTS-Server

The goal was a setup that "just works" using a standard Python virtual environment to avoid dependency conflicts.

The setup is just the standard git clone, pip install in a venv, and python server.py.

k4rnaj1k 2 days ago [ - ]

Oh wow, really impressive. How long did this take you to make?

devnen a day ago [ - ]

It didn't take too long. I already have two similar projects for Dia and Chatterbox tts models so I just needed to convert a few files.

Dayshine 2 days ago [ - ]

It mentions ONNX, so I imagine an ONNX model is or will be available.

ONNX runtime is a single library, with C#'s package being ~115MB compressed.

Not tiny, but usually only a few lines to actually run and only a single dependency.

wongarsu 2 days ago [ - ]

The repository already runs an ONNX model. But the onnx model doesn't get English text as input, it gets tokenized phonemes. The prepocessing for that is where most of the dependencies come from.

Which is completely reasonable imho, but obviously comes with tradeoffs.

pbronez 2 days ago [ - ]

For space sensitive applications like embedded systems, could you shift the preprocessing to compile time?

You would need to constrain the vocabulary to see any benefits, but that could be reasonable. For example, you an enumeration of numbers, units and metric names could handle dynamic time, temperature and other dashboard items.

For something more complex like offline navigation, you already need to store a map. You could store street names as tokens instead of text. Add a few turn commands, and you have offline spoken directions without on device pre-processing.

divamgupta 2 days ago [ - ]

We will try to get rid of dependencies.

WhyNotHugo 2 days ago [ - ]

Usually pulling in lots of libraries helps develop/iterate faster. Then can be removed later once the whole thing starts to take shape.

zelphirkalt 2 days ago [ - ]

This case might be different, but ... usually that "later" never happens.