Whisper doesn't list its training data (or code?), so can't be an open-source model, just an open weights model.
Parakeet does list its training data, and at least one of those is not FOSS, but some of them definitely are FOSS. I wonder if they nVidia would create a fully FOSS model by retraining on only the open data.
Not that I know of. I think the two most prominent open-source models that we hear about are Whisper and Parakeet!
Whisper doesn't list its training data (or code?), so can't be an open-source model, just an open weights model.
Parakeet does list its training data, and at least one of those is not FOSS, but some of them definitely are FOSS. I wonder if they nVidia would create a fully FOSS model by retraining on only the open data.
https://huggingface.co/nvidia/parakeet-rnnt-1.1b#datasets https://catalog.ldc.upenn.edu/LDC2004T19 https://catalog.ldc.upenn.edu/license/ldc-non-members-agreem...