Our meetings often involve a mix of onsite and offsite employees. Typical setup might be CEO + CTO + a VP in a room, connected as a single zoom client to the call (either of these 3 guys depending on who got in the meeting room first), then few additional people joining remotely from home each on their own zoom instance. The guys on the meeting room are using a dedicated camera in the meeting room that captures the entire room, and has all participants in sight. Is this a setup you are trying to address; how are you able to recognize speakers in this configuration ?

Most transcript system we have tried bundle everything that is said by the onsite people as a single entity which pretty much destroys the value of the transcript; especially if people in that room disagree with each other; reading the transcript makes it feel that the onsite guys is very schizophrenic

That’s a great question! We partner with a number of different transcription providers that use AI to identify different speakers based on the sound of their voice. This prevents all the speakers from a conference room from being bundled together as the same person. We’re also going to be looking to add this functionality to our own transcription service in the coming months.