Parakeet V3 is over twice the parameter count of Moonshine Medium (600m vs 245m), so it's not an apples to apples comparison.
I'm actually a little surprised they haven't added model size to that chart.
Parakeet V3 is over twice the parameter count of Moonshine Medium (600m vs 245m), so it's not an apples to apples comparison.
I'm actually a little surprised they haven't added model size to that chart.
parakeet v3 has a much better RTFx than moonshine, it's not just about parameter numbers. Runs faster.
https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
That was my experience when I tried Moonshine against Parakeet v3 via Handy. Moonshine was noticeably slower on my 2018-era Intel i7 PC, and didn't seem as accurate either. I'm glad it exists, and I like the smaller size on disk (and presumably RAM too). But for my purposes with Handy I think I need the extra speed and accuracy Parakeet v3 is giving me.
It is about the parameter numbers if what you care about is edge devices with limited RAM. Beyond a certain size your model just doesn't fit, it doesn't matter how good it is - you still can't run it.
I am not sure what "edge" device you want to run this on, but you can compress parakeet to under 500MB on RAM / disk with dynamic quants on-the-fly dequantization (GGUF or CoreML centroid palettization style). And retain essentially almost all accuracy.
And just to be clear, 500MB is even enough for a raspberry Pi. Then your problem is not memory, is FLOPS. It might run real-time in a RPi 5, since it has around 50 GFLOPS of FP32, i.e. 100 GFLOPS of FP16. So about 20-50 times less than a modern iPhone. I don't think it will be able to keep it real time, TBF, but close.
regardless, this model with such quantization strategy runs real time at +10x real-time factor even in 6-year old iPhones (which you can acquire for under $200) and offline at a reasonable speed, essentially anywhere.
You get the best of both worlds: the accuracy of a whisper transformer at the speed and footprint of a small model.
So I'm kinda new to this whole parakeet and moonshine stuff, and I'm able to run parakeet on a low end CPU without issues, so I'm curious as to how much that extra savings on parameters is actually gonna translate.
Oh and I type this in handy with just my voice and parakeet version three, which is absolutely crazy.