hmmm looks like assembyAI is still unbeatable here in terms of cost/performance unless im mistaken

edit: holy shit parakeet is good.... Moonshine impressive too and it is half the param

Now if only there was something just as quick as Parakeet v3 for TTS ! Then I can talk to codex all day long!!!

Also running parakeet on my phone with https://github.com/notune/android_transcribe_app

Very lightweight and good quality

This is actually pretty impressive. What kinda phone are you using? Are you noticing any drain on battery heat?Do you think it's possible to get this working with Flutter on iOS?

2-3 years old Android flagship phone with 8 GB RAM. When I looked for an app for parakeet, I think I also came across iOS apps. Don't recall it since I use Android. Seems light on the phone/battery. Don't observe any drain but I also only record shorter transcripts at once. Side note: Parakeet is actually pretty nice to do meetings with oneself. Did that on a computer while driving for an hour (split in several transcript chunks). Processed the raw meeting notes afterwards with an LLM. Effective use of the time in the car...

Thank you for sharing ! What about the quality of the transcripts? Is it able to do live streaming?

Unfortunately, Parakeet doesn't support streaming like Moonshot does (as much as I know). Would be perfect to have sth of the size of Parakeet but supporting streaming. Still hope Nvidia releases a V4 with that feature :) Otherwise, I think STT is basically a solved problem running locally on edge devices.

I think there is a streaming version of Parakeet. It is often referred to as Nemotron, though.

I tried comparing Parakeet streaming with Moonshine streaming. Moonshine is smaller, and I felt it was subjectively faster with about the same level of accuracy.

Parakeet doesn't require a GPU. I'm handily running it on my Ubuntu Linux laptop.

I'm looking to switch from feeding the default android "recorder" app's .WAV into Gemini 3 Pro (via the app) with (usually just) a `Transcribe this please:` prompt; content is usually German voice instructions/explanation for how to do/approach some sysadmin stuff; there does tend to be some amount of interjecting (primarily for clarifications(-posing/-requesting)) by me to resolve ambiguity as early as possible/practical.

If e.g. parakeet can be run on my phone in real time showing the transcript live:

- with latency low enough to be "comfortable enough" for the instructor to keep an eye on and approve the transcribed instructions

[not necessarily every word of the transcript, i.e., a commanded "edit" doesn't need to be applied in the outcome as long as it's nature is otherwise clear enough to not add meaningful amounts of ambiguity to the final "written" instructions]

by glancing at the screen while dictating the explanation (and blurting out any transcription complaints as soon as that's possible without breaking one's own string-of-thought or spoken grammar too much)

, I'd very happily switch to that approach instead of what I was doing.

Bonus if there's a no-bulky-or-expensive-hardware way to accommodate us both speaking over each other so I won't have to _interrupt_ his speaking just to put a clarifying comment (on what he just said) in the transcript for him to see and sign off, where the at least "only" briefly interrupts his thoughts right while he actually reads my transcribed words (he doesn't have to hear them, and it's better if he won't; I can probably get him to put on earmuffs to not hear me louder than he hears his thoughts, and a sufficiently-smoothed SNR meter for specifically his voice should take care him regulating his volume while the earmuffs mute it and I occasionally talk over him)...

[flagged]

LLM account

you are right i just downloaded it on handy and its working i can't believe it

i was using assmeblyAI but this is fast and accurate and offline wtf!

parakeet is amazing, it has completely ousted whisper for me. On Linux, both handy.computer and epicenter Whispering (using parakeet of course) work incredibly well for set-and-forget STT. I use it constantly to write messages on Slack/Teams, do debate with claude code etc. Both have minor bugs, but I can easily accept those, these apps being FOSS and all.

On Mac, I've been using VoiceInk and it's even better. VoiceInk (and MacWhisper too, IIRC) use the neural engine and the delay between dictation and appearance of the typed text is almost imperceptible.

What's wrong with piper?