It's not unreasonable to think that the level of acceptable risk for "the language model parsed my text wrong" is in average much higher than "the medical model misdiagnosed my condition". You can probably come up with scenarios where a language model behaving unexpectedly would have drastic consequences if you imagine them hooked up to automatic systems where they have immediate control over actions that can't easily be reversed, but like, that's why it's a bad idea to use them like that, and they're the exception rather than the rule. It seems plausible that scenarios like that for medical models are a lot closer to the norm than the exception, in which case the tolerance we have for them "filling in the gaps" incorrectly would need to be much smaller.
this doesn't need to be a diagnostic model, just a data source for existing doctors