The headline feature isn’t the 25 MB footprint alone. It’s that KittenTTS is Apache-2.0. That combo means you can embed a fully offline voice in Pi Zero-class hardware or even battery-powered toys without worrying about GPUs, cloud calls, or restrictive licenses. In one stroke it turns voice everywhere from a hardware/licensing problem into a packaging problem. Quality tweaks can come later; unlocking that deployment tier is the real game-changer.
yeah, we are super excited to build tiny ai models that are super high quality. local voice interfaces are inevitable and we want to power those in the future. btw, this model is just a preview, and the full release next week will be of much higher quality, along w another ~80M model ;)
> It’s that KittenTTS is Apache-2.0
Have you seen the code[1] in the repo? It uses phonemizer[2] which is GPL-3.0 licensed. In its current state, it's effectively GPL licensed.
[1]: https://github.com/KittenML/KittenTTS/blob/main/kittentts/on...
[2]: https://github.com/bootphon/phonemizer
Edit: It looks like I replied to an LLM generated comment.
The issue is even bigger: phonemizer is using espeak-ng, which isn't very good at turning graphemes into phonemes. In other TTS which rely on phonemes (e.g. Zonos) it turned out to be one of the key issues which cause bad generations.
And it isn't something you can fix, because the model was trained on bad phonemes (everyone uses Whisper + then phonemizes the text transcript).
https://github.com/KittenML/KittenTTS/issues/17
> IANAL, but AFAICS this leaves 2 options, switching the license or removing that dependency.
There is a third option: asking the project for an exception.
Though that is unlikely to be granted¹ leaving you back with just the other two options.
And of course a forth choice: just ignore the license. This is the option taken by companies like Onyx, whose products I might otherwise be interested in…
----
[1] Those of us who pick GPL3 or AGPL generally do so to keep things definite and an exception would muddy the waters, also it might not even be possible if the project has many maintainers as relicensing would require agreement from all who have provided code that is in the current release. Furthermore, if it has inherited the license from one of its dependencies, an exception is even less practical.
> There is a third option: asking the project for an exception.
IIUC, the project isn't at the liberty to grant such an exception because it inherits its GPL license from espeak-ng.
Ah, yes, good catch, I didn't look deeper into the dependency tree at all. I'll update my footnote to include that as one of the reasons an exception may be impossible (or at least highly impractical).
A fourth option would be a kind of dual-licensing: the project as-is is available under GPL-3.0, but the source code in this repository excluding any dependencies is also available under Apache 2.0
Any user would still effectively be bound by the GPL-3.0, but if someone can remove the GPL dependencies they could use the project under Apache
That is an option for the publisher of the library, not the consumer of it. If it isn't already done then asking for it to be done is the same as asking for an exception otherwise (option three).
The use of the library is four lines. Three set up the library (`phonemizer.backend.EspeakBackend(language="en-us", preserve_punctuation=True, with_stress=True)`), the other calls it (`phonemes_list = self.phonemizer.phonemize([text])`). Plus I guess the import statements. Even ignoring Google vs Oracle I don't think those lines by themselves meet any threshold of originality.
Obviously you can't run them (with the original library) without complying with the GPL. But I don't see why I couldn't independently of that also give you this text file under Apache 2.0 to do with as you want (which for the record still doesn't allow you to run them with the original library without complying with the GPL, but that'd be phoneme forcing you to do that, not this project)
You would have to be very specific about the dual-licensing to avoid confusion about what you are allowed to do under Apache conditions though. You can't just say "it's dual-licensed"
You could even extract out the parts that do not call the GPL library into an upstream project under the Apache 2.0 licence, and pull in both that and the GPL library in the downstream project, relying on Apache 2.0 -> GPL 3.0 compatibility instead of explicit dual licensing to allow the combined work to be distributed under GPLv3.
Once the license issues are resolved it would nice if you could install it on a distro with the normal package manager.
This would only apply if they were distributing the GPL licensed code alongside their own code.
If my MIT-licensed one-line Python library has this line of code…
…I’m not suddenly subject to bash’s licensing. For anyone wanting to run my stuff though, they’re going to need to make sure they themselves have bash installed.(But, to argue against my own point, if an OS vendor ships my library alongside a copy of bash, do they have to now relicense my library as GPL?)
The FSF thinks it counts as a derivative work and you have to use the LGPL to allow linking.
However, this has never actually been proven in court, and there's many good arguments that linking doesn't count as a derivative work.
Old post by a lawyer someone else found (version 3 wouldn't affect this) [1]
For me personally I don't really understand how, if dynamic linking was viral, using linux to run code isn't viral. Surely at some level what linux does to run your code calls GPLed code.
It doesn't really matter though, since the FSF stance is enough to scare companies from not using it, and any individual is highly unlikely to be sued.
[1] https://www.linuxjournal.com/article/6366
> For me personally I don't really understand how, if dynamic linking was viral, using linux to run code isn't viral. Surely at some level what linux does to run your code calls GPLed code.
The Linux kernel has an explicit exception for userspace software:
> NOTE! This copyright does not cover user programs that use kernel services by normal system calls
And the GPL also has an explicit exception for "system" software such as kernel, platform libraries etc.:
> The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it.
> The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work.
> This would only apply if they were distributing the GPL licensed code alongside their own code.
As far as I understand the FSF's interpretation of their license, that's not true. Even if you only dynamically link to GPL-licensed code, you create a combined work which has to be licensed, as a whole, under the GPL.
I don't believe that this extends to calling an external program via its CLI, but that's not what the code in question seems to be doing.
(This is not an endorsement, but merely my understanding on how the GPL is supposed to work.)
This is a false analogy. It's quite straightforward.
Running bash (via exec()/fork()/spawn()/etc) isn't the same as (statically or dynamically) linking with its codebase. If your MIT-licensed one-liner links to code that's GPL licensed, then it gets infected by the GPL license.
I've seen people use IPC to workaround the GPL, but I've also seen the FSF interpretations claiming that is still a derived work.
I don't know if this has ever been tested in court.
My interpretation of their FAQ[1] on it is that shelling out and IPC are fine, while linking is not. As you say, it's ultimately up to the courts to decide on.
[1]: https://www.gnu.org/licenses/gpl-faq.html#MereAggregation
you are correct. its about linking as in LD does it, not conceptual linking.
GPL is for boomers at this point. Floppy disks? Distribution? You can use a tool but you cant change it? A DLL call means you need to redistribute your code but forking doesn't?
Sillyness
GPL post-dates network software distribution (we got our first gcc via ftp).
Yes, but if you use open source libraries for your closed source SaaS - thats fine. People get their software _over_ the network delivered to them in a VM (your browser).
Given that the FSF considers Apache-2.0 to be compatible with GPL-3.0 [0], how could the fact that phonemizer is GPL-3.0 possibly be an issue?
[0]: https://www.gnu.org/licenses/license-list.html#apache2
Compatible means they can be linked together, BUT the result is GPL-3.
> the result is GPL-3
The result can only be distributed under the terms of the GPL-3. That's actually a crucial difference: there's nothing preventing Kitten TTS from being Apache licensed, soliciting technical contributions under that license, and parts of its code being re-used in other software under that license. Yes, for the time being, this limits what you can do with Kitten TTS if you want to use the software as a whole (e.g. by embedding it into your product), but the license itself is still Apache and that can have value.
Okay, what's stopping you from feeding the code into an LLM and re-write it and make it yours? You can even add extra steps like make it analyze the code block by block then supervise it as it is rewriting it. Bam. AI age IP freedom.
Morals may stop you but other than that? IMHO all open source code is public domain code if anyone is willing to spend some AI tokens.
That would be a derivative work, and still be subject to the license terms and conditions, at best.
There are standard ways to approach this called clean room engineering.
https://en.m.wikipedia.org/wiki/Clean-room_design
One person reads the code and produces a detailed technical specification. Someone reviews it to ensure that there is nothing in there that could be classified as copyrighted material, then a third person (who has never seen the original code) implements the spec.
You could use an LLM at both stages, but you'd have to be able to prove that the LLM that does the implementation had no prior knowledge of the code in question... Which given how LLMs have been trained seems to me to be very dubious territory for now until that legal situation gets resolved.
AI is useful in Chinese walling code, but it’s not as easy as you make it sound. To stay out of legal trouble, you probably should refactor the code into a different language, then back into the target language. In the end, it turns into a process of being forced to understand the codebase and supervising its rewriting. I’ve translated libraries into another language using LLMs, I’d say that process was 1/2 the labor of writing it myself. So in the end, going 2 ways, you may as well rewrite the code yourself… but working with the LLM will make you familiar with the subject matter so you -could- rewrite the code, so I guess you could think of it as a sort of buggy tutorial process?
I am not sure even that is enough. You would really need to do a clean room reimplementation to be safe - for exactly the same reasons that people writing code write clean room reimplementations.
Yeah, the algorithms and program flow would have to be materially distinct to be really safe. Maybe switching language paradigms would get that for you in most cases? Js->haskell->js? Sounds like a nightmare lol.
Tell me you haven't used LLMs on large, non-trivial codebases without telling me... :)
Tell me you don't know how to use LLMs properly without telling me.
You don't give the whole codebase to an LLM and expect it to have one shot output. Instead, you break it down and and write the code block by block. Then the size if the codebase doesn't matter. You use the LLM as a tool, it is not supposed to replace you. You don't try to become George from Jetsons who is just pressing a button and doesn't touch anything, instead you are on top of it as the LLM does the coding. You test the code on every step to see if the implementation behaves as expected. Do enough of this and you have proper, full "bespoke" software.
I'll help you along - this is the core function that Kitten ends up calling. Good luck!
https://github.com/espeak-ng/espeak-ng/blob/a4ca101c99de3534...
A Festival's English model, festvox-kallpc16k, is about 6 MB, and it is a large model; festvox-kallpc8k is about 3.5 MB.
eSpeak NG's data files take about 12 MB (multi-lingual).
I guess this one may generate more natural-sounding speech, but older or lower-end computers were capable of decent speech synthesis previously as well.
Custom voices could be added, but the speed was more important to some users.
$ ls -lh /usr/bin/flite
Listed as 27K last I checked.
I recall some Blind users were able to decode Gordon 8-bit dialogue at speeds most people found incomprehensible. =3
I'm not blind but spoken English it's far more difficult to grasp than written one (I'm a non-native speaker), and Flite runs on n270 netbooks at crazy speeds with really good enough voices.
> KittenTTS is Apache-2.0
What about the training data? Is everyone 100% confident that models are not a derived work of the training inputs now, even if they can reproduce input exactly?
I play around with a nvidia jetson orin nano super right now and its actually pretty usuable with gemma3:4b and quite fast - even image processing is done in like 10-20 seconds but this is with GPU support. When something is not working and ollama is not using the GPU this calls take ages because the cpu is just bad.
Iam curious how fast this is with CPU only.
It depends on espeak-ng which is GPLv3
This opens up voice interfaces for medical devices, offline language learning tools, and accessibility gadgets for the visually impaired - all markets where cloud dependency and proprietary licenses were showstoppers.
But Pi Zero has a GPU, so why not make use of it?
Because then you're stuck on that device only.
The github just has a few KB of python that looks like an install script. How is this used from C++ ?