The only issue I have with those tools, and I have not seen a single one even acknowledge this, is that it becomes completely useless when holding meetings in a hybrid fashion where some people are remote and others are in the office with a shared mic.
Almost all of our meetings are hybrid in this way, and it's a real pain having almost half of the meeting be identified as a single individual talking because the mic is hooked up to their machine.
It's a total dealbreaker for us, and we won't use such tools until that problem is solved.
It can be solved with speaker segmentation/embedding models, although it is not perfect. One thing we do with Hyprnote is that we have a Descript-like transcript editor that allows you to easily edit/assign speakers. Once we integrate a speaker diarization model with that, I think we'll be in good shape.
If you are interested, you can join our Discord and follow updates. :) https://hyprnote.com/discord
Oh awesome, I was reading through to see about whether it had speaker diarization (why I got rid of my whisper script I use).
I'll look forward to the Linux version.
Is there any chance of a headless mode? (I.e. start, and write transcript to stdout with some light speaker diarization markup. e.g. "Speaker1: text")
> Is there any chance of a headless mode?
maybe. we might be able to add extension system that each extension can have that info and do whatever it want within the app.
> I'll look forward to the Linux version.
https://github.com/fastrepl/hyprnote/issues/67 We have open issue. You might want to subscribe to it!
our conference rooms even have some sort of rotating camera contraption that automatically focus on the person speaking
I forbid this kind of meeting on my teams.
Either everyone is in the same physical room, or everyone is remote.
The quality of communication plummets in the hybrid case:
* The physical participants have much higher bandwidth communication than those who are remote — they share private expressions and gestures to the detriment of remote.
* The physical participants have massively lower latency communications. In all-online meetings, everyone an adjust and accommodate the small delays; in hybrid meetings it often locks out remote participants who are always just a little behind or have less time to respond.
* The audio quality of remote is significantly worse, which I have seen result in their comments being treated as leas credible.
* Remote participants usually get horrible audio quality from those sharing a mic in the room. No one ever acknowledges this, but it dramatically impacts ability to communicate.
you might need an AI for in-person meeting first. Such tools are available to doctors who see patients. The note taking is great but I think it is skewed towards one-person summary where the name of the patient remains unknown. I wonder if the same tool can take notes if two patients are in the room and distinguish between each one.
The second tool is likely hardware limitation. A multi-cam-mic with beam forming capability to deconstruct overlapping sounds.
hyprnote can be used for in-person meetings as well! we have doctors like ophthalmologists or psychiatrists using it right now. and yes - definitely going to be working on speaker identification as it crucial.
I recently tried Vibe (https://github.com/thewh1teagle/vibe) from a recording of a meeting taken on one side. It was able to identify the speakers. As Speaker 1, 2, etc. But still useful to see.
yeah vibe is a great app. we're actually friends with the maintainer :)
I think if you put N-1 mics in the room (where N is the number of people) you could easily identify all individuals...