Can someone who is a more AI heavy user explain what is going on?
I would expect an "AI Note Taker" to faithfully transcribe the entire conversation. With the same quality I see in a lot of automated video subtitles.. ie they use the wrong word a lot but it's easy to tell what they mean by context.
Are these tools instead immediately summarising the whole thing, and that summary is the artifact? Because that is a beyond insane way to treat human communication.
I work specifically in voice AI and am very familiar with how these tools and systems work.
> I would expect an "AI Note Taker" to faithfully transcribe the entire conversation. With the same quality I see in a lot of automated video subtitles.. ie they use the wrong word a lot but it's easy to tell what they mean by context.
That's a reasonable expectation, but would not be a safe one. All transcription tools are not made the same. First it depends on what kind of STT/ASR (speech-to-text / automatic speech recognition) model they are using. A lot of tools like to use some flavor of OpenAI's Whisper model. It works well generally but I would never use it in a critical use case like healthcare. Because it can hallucinate. That's specific to its architecture and how it was trained.
There's a fairly large variety of architectures that can be used for STT/ASR. Some of them are designed for "offline" / "batch" / pre-recorded audio. Some are designed for fast real-time streaming transcription.
There are more factors too like training data. And not just demographics of the speakers in the training data but audio environments too. Was the model trained on echo-y doctor offices with two people being recorded from a crappy smartphone mic or desktop mic? (It could've been! But it's an important distinction.)
And there's more factors than that, but you get the picture (e.g. are they trying to "clean up" the transcript afterwards by feeding it to an LLM, are they attempting to pre-process audio before transcription also in an attempt to boost accuracy)
There's a lot of ways to do it, meaning, there's a lot of ways to screw it up.