https://telephone.health, which shows how well LLMs can take narrative medical text, convert it to a structured form (FHIR R4, for application consumption), and then convert it back to narrative text for human consumption.
Interesting findings include Mistral doing better than Gemini 3 Pro in certain usescases, cross-LLM works better than one LLM to another, oh and - the cost all of of this. So, so expensive.