This is not a popular view 'AI sucks at X but so do humans' but I think it is valid and we should take wins where we can, especially in healthcare. It is pretty clear that initial accuracy issues will become less and less of a problem as these technologies mature. This focus on accuracy now as a 'see it's bad' talking point though misses the real danger. Medical note takers have an exceptionally high chance of being hijacked for money and that is an issue we need to bring attention to now. They provide a real-time feed into a trillion dollar industry. Just roll that around in your head for a second. Insurance companies are going to want to tap that feed in real time so they can squeeze more money out. Drug makers are going to want to tap into that feed so they can abuse the data. Hospitals will want to tap into that feed to wring more out of doctors and boost the number of billable codes for each encounter. Very few entities are looking to tap into that feed to, you guessed it, help the patient. I am for these systems (and I have been involved in building them in the past) but the feeding frenzy of business interest that will obviously get involved with them is the thing we should be yelling and screaming about, not short-term accuracy issues.
> It is pretty clear that initial accuracy issues will become less and less of a problem as these technologies mature.
What do you base this on?
As someone who can both see the amazing things genAI can do, and who sees how utterly flawed most genAI output is, it's not obvious to me.
I'm working with Claude every day, Opus 4.7, and reviewing a steady stream of PRs from coworkers who are all-in, not just using due to corporate mandates like me, and I find an unending stream of stupidity and incomprehension from these bots that just astonishes me.
Claude recently output this to me:
"I've made those changes in three files:
- File 1
- File 2"
That is a vintage hallucination that could've come right out of GPT 2.0.
> That is a vintage hallucination that could've come right out of GPT 2.0.
That's because, despite the many claims to the contrary, the models haven't actually gotten any smarter. They are still just token prediction engines at the end of the day, without any understanding of what they are doing. That's why one should not rely on them.
> It is pretty clear that initial accuracy issues will become less and less of a problem as these technologies mature.
Does it?
Actually, yes. I have seen this specific industry mature from the very first fully automated note and kept tabs on it. The accuracy has increased massively and continues to increase due to several factors:
- Speech recognition and frontier models are continuing to get better at handling these types of conversations across accents, languages and specialties. The trend is obvious and clear here. Compare GPT 4 with Opus 4.7 and there is no contest. I'd even take GPT 5.4 nano over GPT 4 right now. So, yeah, they have been improving and, yeah, they will keep on improving.
- The pipelines these models are being built into are getting much more sophisticated than just 'transcribe with x and have GPT XX clean it up'. The people building these things aren't standing still. Even if they did keep using the same models the pipeline improvements would make things get better over time. Add that in with the model improvements and the gains are even greater.
- The companies doing this work are seeing more and more edge cases. Data matters. More and more practitioners are using these things. That means more to learn from. It also means more stories of things being wrong. If you cut your error rate in half but increase your customer base by 10x then you will be hearing about 5x the problems. We are seeing that right now.
- Providers are starting to adjust to the technology (repeat areas they know may cause trouble, adjust their audio setups, etc etc) Just like any technology both sides shift and it matters. The first users were champions. The second wave were mixed between champions, haters and people that didn't care yet. Now people are really starting to count on this technology. They know it isn't a fad and isn't going away and are actually using it day to day to get their work done. This means they are adjusting to it as needed to get to the next patient/note/etc.
This stuff is just a few years old and the gains are obvious and massive. They aren't going to suddenly stop improving. There is an argument that they will asymptotically approach some level of utility, but we are still gaining quickly right now.