That's a great point about the dependencies.
To make the setup easier and add a few features people are asking for here (like GPU support and long text handling), I built a self-hosted server for this model: https://github.com/devnen/Kitten-TTS-Server
The goal was a setup that "just works" using a standard Python virtual environment to avoid dependency conflicts.
The setup is just the standard git clone, pip install in a venv, and python server.py.
Oh wow, really impressive. How long did this take you to make?
It didn't take too long. I already have two similar projects for Dia and Chatterbox tts models so I just needed to convert a few files.