I tried Kokoro for voicing blog posts and articles and wasn't impressed to be honest. Right now Gemini 2.5 Flash TTS is a much more capable system with generous free limits (about 10 minutes per generation and about 90 minutes per day). Voices are not very consistent between generations, but for shorter pieces it's not a big deal (but will obviously be for books)
Kokoro is fine for TTS, but it lacks emotion. But for a model of this size, that is kind of given.
I played with ebook generation a bunch and find that (at least for English text) around 1B is needed to get something usable emotionally (Chatterbox is 0.5B, Orpheus is 3B).
Ironic given the name: kokoro is Japanese for heart or sentiment.