According to the OpenASR Leaderboard [1], looks like Parakeet V2/V3 and Canary-Qwen (a Qwen finetune) handily beat Moonshine. All 3 models are open, but Parakeet is the smallest of the 3. I use Parakeet V3 with Handy and it works great locally for me.

[1]: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

Parakeet V3 is over twice the parameter count of Moonshine Medium (600m vs 245m), so it's not an apples to apples comparison.

I'm actually a little surprised they haven't added model size to that chart.

parakeet v3 has a much better RTFx than moonshine, it's not just about parameter numbers. Runs faster.

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

That was my experience when I tried Moonshine against Parakeet v3 via Handy. Moonshine was noticeably slower on my 2018-era Intel i7 PC, and didn't seem as accurate either. I'm glad it exists, and I like the smaller size on disk (and presumably RAM too). But for my purposes with Handy I think I need the extra speed and accuracy Parakeet v3 is giving me.

It is about the parameter numbers if what you care about is edge devices with limited RAM. Beyond a certain size your model just doesn't fit, it doesn't matter how good it is - you still can't run it.

I am not sure what "edge" device you want to run this on, but you can compress parakeet to under 500MB on RAM / disk with dynamic quants on-the-fly dequantization (GGUF or CoreML centroid palettization style). And retain essentially almost all accuracy.

And just to be clear, 500MB is even enough for a raspberry Pi. Then your problem is not memory, is FLOPS. It might run real-time in a RPi 5, since it has around 50 GFLOPS of FP32, i.e. 100 GFLOPS of FP16. So about 20-50 times less than a modern iPhone. I don't think it will be able to keep it real time, TBF, but close.

regardless, this model with such quantization strategy runs real time at +10x real-time factor even in 6-year old iPhones (which you can acquire for under $200) and offline at a reasonable speed, essentially anywhere.

You get the best of both worlds: the accuracy of a whisper transformer at the speed and footprint of a small model.

So I'm kinda new to this whole parakeet and moonshine stuff, and I'm able to run parakeet on a low end CPU without issues, so I'm curious as to how much that extra savings on parameters is actually gonna translate.

Oh and I type this in handy with just my voice and parakeet version three, which is absolutely crazy.

To this comment and all the other comments talking about handy below this comment. I tried handy right now and it's super amazing. I'm speaking this from Handy. This is so cool, man.

And handy even takes care of all the punctuation, which is really nice.

Thanks a lot for suggesting it to me. I actually wanted something like this, and I was using something like Google Docs, and it required me to use Chrome to get the speech to text version, and I actually ended up using Orion for that because Orion can actually work as a Chrome for some reason while still having both Firefox and Chrome extension support. So and I had it installed, but yeah.

This is really amazing and actually a sort of lifesaver actually, so thanks a lot, man.

Now I can actually just speak and this can convert this to text without having to go through any non-local model or Google Docs or whatever anything else.

Why is this so good man? It's so good

man, I actually now am thinking that I had like fully maxed out my typing speed to like hundred-120. But like this can actually write it faster. you know it's pretty amazing actually.

Have a nice day, or as I abbreviate it, HAND, smiley face. :D

Was a big fan of Handy until I found Hex, which, incredibly, has even faster transcription (with Parakeet V3), it’s MacOS only:

https://github.com/kitlangton/Hex

I tried this out but the brew command errors out saying it only works on macOS versions older than Sequoia.

That's unfortunate. I think I can update my version but I have heard some bad things about performance from the newer update from my elder brother.

> I tried this out but the brew command errors out saying it only works on macOS versions older than Sequoia.

Newer than Sequoia, you mean?

The brew recipe [1] says macOS >= 15.

Anyway, I'm on Sequoia — it's mostly better than Ventura, which was what my M2 MacBook Pro came with. I'm holding off upgrading to Tahoe (macOS 26), hoping they fix liquid glAss.

[1] https://formulae.brew.sh/cask/kitlangton-hex

works fine on my MacOS w Tahoe

By the way, I've been using a Whisper model, specifically WhisperX, to do all my work, and for whatever reason I just simply was not familiar with the Handy app. I've now downloaded and used it, and what a great suggestion. Thank you for putting it here, along with the direct link to the leaderboard.

I can tell that this is now definitely going to be my go-to model and app on all my clients.

I have to ask- I see this handy app running on Mac and you hold a key down and then it doesn't show until seemingly a while later.

The one built in is much faster, and you only have to toggle it on.

Are these so much more accurate? I definitely have to correct stuff, but pretty good experience.

Also use speech to text on my iphone which seems to be the same accuracy.

I'm building a local-first transcription iOS app and have been on Whisper Medium, switching to Parakeet V3 based on this.

One note for anyone using Handy with codex-cli on macOS: the default "Option + Space" shortcut inserts spaces mid-speech. "Left Ctrl + Fn" works cleanly instead. I'm curious to know which shortcuts you're using.

I am looking for such an app. Main use case is transcribing voice notes received on Signal while preserving privacy. Please post when you launch :)

Handy is amazing. Super quality app.

It really is. It's kinda ridiculous that it's free.

I'm quite surprise to see that level of polish from an open-source project.

Are voice or a transcript sent back to their servers? If so, you may be the product

No, it's just somebody's open source project: https://github.com/cjpais/handy

why V3 over V2 (assuming English only)?

hmmm looks like assembyAI is still unbeatable here in terms of cost/performance unless im mistaken

edit: holy shit parakeet is good.... Moonshine impressive too and it is half the param

Now if only there was something just as quick as Parakeet v3 for TTS ! Then I can talk to codex all day long!!!

Also running parakeet on my phone with https://github.com/notune/android_transcribe_app

Very lightweight and good quality

This is actually pretty impressive. What kinda phone are you using? Are you noticing any drain on battery heat?Do you think it's possible to get this working with Flutter on iOS?

2-3 years old Android flagship phone with 8 GB RAM. When I looked for an app for parakeet, I think I also came across iOS apps. Don't recall it since I use Android. Seems light on the phone/battery. Don't observe any drain but I also only record shorter transcripts at once. Side note: Parakeet is actually pretty nice to do meetings with oneself. Did that on a computer while driving for an hour (split in several transcript chunks). Processed the raw meeting notes afterwards with an LLM. Effective use of the time in the car...

Thank you for sharing ! What about the quality of the transcripts? Is it able to do live streaming?

Unfortunately, Parakeet doesn't support streaming like Moonshot does (as much as I know). Would be perfect to have sth of the size of Parakeet but supporting streaming. Still hope Nvidia releases a V4 with that feature :) Otherwise, I think STT is basically a solved problem running locally on edge devices.

I think there is a streaming version of Parakeet. It is often referred to as Nemotron, though.

I tried comparing Parakeet streaming with Moonshine streaming. Moonshine is smaller, and I felt it was subjectively faster with about the same level of accuracy.

Parakeet doesn't require a GPU. I'm handily running it on my Ubuntu Linux laptop.

I'm looking to switch from feeding the default android "recorder" app's .WAV into Gemini 3 Pro (via the app) with (usually just) a `Transcribe this please:` prompt; content is usually German voice instructions/explanation for how to do/approach some sysadmin stuff; there does tend to be some amount of interjecting (primarily for clarifications(-posing/-requesting)) by me to resolve ambiguity as early as possible/practical.

If e.g. parakeet can be run on my phone in real time showing the transcript live:

- with latency low enough to be "comfortable enough" for the instructor to keep an eye on and approve the transcribed instructions

[not necessarily every word of the transcript, i.e., a commanded "edit" doesn't need to be applied in the outcome as long as it's nature is otherwise clear enough to not add meaningful amounts of ambiguity to the final "written" instructions]

by glancing at the screen while dictating the explanation (and blurting out any transcription complaints as soon as that's possible without breaking one's own string-of-thought or spoken grammar too much)

, I'd very happily switch to that approach instead of what I was doing.

Bonus if there's a no-bulky-or-expensive-hardware way to accommodate us both speaking over each other so I won't have to _interrupt_ his speaking just to put a clarifying comment (on what he just said) in the transcript for him to see and sign off, where the at least "only" briefly interrupts his thoughts right while he actually reads my transcribed words (he doesn't have to hear them, and it's better if he won't; I can probably get him to put on earmuffs to not hear me louder than he hears his thoughts, and a sufficiently-smoothed SNR meter for specifically his voice should take care him regulating his volume while the earmuffs mute it and I occasionally talk over him)...

[flagged]

LLM account

you are right i just downloaded it on handy and its working i can't believe it

i was using assmeblyAI but this is fast and accurate and offline wtf!

parakeet is amazing, it has completely ousted whisper for me. On Linux, both handy.computer and epicenter Whispering (using parakeet of course) work incredibly well for set-and-forget STT. I use it constantly to write messages on Slack/Teams, do debate with claude code etc. Both have minor bugs, but I can easily accept those, these apps being FOSS and all.

On Mac, I've been using VoiceInk and it's even better. VoiceInk (and MacWhisper too, IIRC) use the neural engine and the delay between dictation and appearance of the typed text is almost imperceptible.

What's wrong with piper?

How much VRAM does parakeet take for you? For some reason it takes 4GB+ for me using the onyx version even though it’s 600M parameters

There are different versions of the parakeet model. The 8-bit quantized version doesn't use as many bits. Thus it saves space (only using about 600MB) while maintaining about the same level of accuracy.

I think most apps that use Parakeet tend to use this version of the model?

See if Parakeet (Nemotron) still uses 4GB+ with my implementation: https://rift-transcription.vercel.app/local-setup