Does it do separate speaker identification (diarization)?
What's the stack, if I may ask? (I believe Whisper-X does the diarization thing)
Does it do separate speaker identification (diarization)?
What's the stack, if I may ask? (I believe Whisper-X does the diarization thing)