Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.
Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.
I love this idea, and originally planned to build it using local models, but to have post-processing (that's where you get correctly spelled names when replying to emails / etc), you need to have a local LLM too.
If you do that, the total pipeline takes too long for the UX to be good (5-10 seconds per transcription instead of <1s). I also had concerns around battery life.
Some day!
https://github.com/cjpais/Handy
It’s free and offline
Wow, Handy looks really great and super polished. Demo at https://handy.computer/
[I'm using] Handy myself right now. And it's pretty good. I don't have any problems with it, except that I wish that it would slowly roll out the text as you talk instead of waiting to transcribe into the very end. because I like to rant and ramble a little bit and then go back and edit what I've written rather than having to perfectly compose on the first attempt. And that's one of the big advantages, in my opinion, of using a voice to text app is that it would let you ramble and rant and see what you have said and keep making additions and alterations to that. For instance, I'm doing this entire bit using handy in one stream of thought take. And so it's probably gonna be a bit rambly and not very polished, but at the same time it's more representative of a general use case. And I'm talking quite a bit so that I can actually put the system under stress and see how well it responds.
My only issue with it was that it cut off the words [I'm using] at the beginning and obviously it doesn't enter paragraph breaks. It took about 25 seconds to transcribe all of that on a 10th gen i7 laptop processor.
If they could incorporate combination typing out what was said while you're talking it would be pretty perfect.