Hacker News

Pretty dang close isn't the same as accurate for an exchange of time and money. Voice->text, with a noisy background, is a particularly hard problem. Especially with hardware not designed to limit background noise. Try it. Whisper is still the leading speech->text model in our tests, but add noise reduction, echo, diarization, etc. It's a hard problem.