Awesome work! Often times in the TTS space, human-similarity is given way too much emphasis at the expense of hurting user access. Frankly as long as a voice is clear and you listen to it for a while, the brain filters out most quirks you would perceive on the first pass. Hence why many blind folks still are perfectly fine using espeak-ng. The other properties like speed of generation and size make it worth it.

I've been using a custom AI audiobook generation program [0] with piper for quite a while now and am very excited to look at integrating kitten. Historically piper has been the only good option for a free CPU-only local model so I am super happy to see more competition in the space. Easy installation is a big deal, since piper historically has had issues with that. (Hence why I had to add auto installation support in [0])

[0] https://github.com/C-Loftus/QuickPiperAudiobook