In the dark ages of machine learning, researchers tried to fit natural language into a defined, human-curated taxonomy.
It kinda worked, for a reasonable amount of stuff; but failed quite a lot of the time, and there's an extremely long tail of things that would have been pragmatically impossible to ever address with that method--indeed, without adopting an entirely new, unsupervised model of language, continuous in places where the old way was discrete.
It's not unreasonable to think that the level of acceptable risk for "the language model parsed my text wrong" is in average much higher than "the medical model misdiagnosed my condition". You can probably come up with scenarios where a language model behaving unexpectedly would have drastic consequences if you imagine them hooked up to automatic systems where they have immediate control over actions that can't easily be reversed, but like, that's why it's a bad idea to use them like that, and they're the exception rather than the rule. It seems plausible that scenarios like that for medical models are a lot closer to the norm than the exception, in which case the tolerance we have for them "filling in the gaps" incorrectly would need to be much smaller.
this doesn't need to be a diagnostic model, just a data source for existing doctors