Hacker News

Native diarization, this looks exciting. edit: or not, no diarization in real-time.

https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...

~9GB model.

The diarization is on Voxtral Mini Transcribe V2, not Voxtral Mini 4B.

Do you have experience with that model for diarization? Does it feel accurate, and what's its realtime factor on a typical GPU? Diarization has been the biggest thorn in my side for a long time..

ashenke 2 months ago [ - ]

You can test it yourself for free on https://console.mistral.ai/build/audio/speech-to-text I tried it on an english-speaking podcast episode, and apart from identying one host as two different speakers (but only once for a few sentences at the start), the rest was flawless from what I could see

sbrother 2 months ago [ - ]

Amazing. Thank you.

coder543 2 months ago [ - ]

> Do you have experience with that model

No, I just heard about it this morning.

observationist 2 months ago [ - ]

Ahh, yeah, and it's explicitly not working for realtime streams. Good catch!