Having built with and tried every voice model over the last three years, real time and non-real time... this is off the charts compared to anything I've seen before.
And open weight too! So grateful for this.
Having built with and tried every voice model over the last three years, real time and non-real time... this is off the charts compared to anything I've seen before.
And open weight too! So grateful for this.
This past month Parakeet v3 dropped with a streaming ASR model that is 0.6B params, can run on a CPU and is super good.
I did say all the model. :)
Yes I've tried Parakeet v3 too. For its own purpose - running locally - it's amazing.
The thing that's particularly amazing about this Voxtral model is how incredibly rock solid the accuracy is.
For the longest time previous models have been 'mostly correct' or as people have commented elsewhere on this HN thread, have dropped sentences or lost or added utterances.
I have no affiliation with these folks, but I tried and struggled to get this model to break even speaking as adversariately as I could.
That's a totally different class of model.
Do you mean https://huggingface.co/nvidia/nemotron-speech-streaming-en-0... ?
Yes. That is it
What's the business plan here?