I wouldn't trust a non-radiologist to safely interpret the results of an AI model for radiology, no matter how well that model performs in benchmarks.

Similar to how a model that can do "PhD-level research" is of little use to me if I don't have my own PhD in the topic area it's researching for me, because how am I supposed to analyze a 20 page research report and figure out if it's credible or not?

The notion of “PhD-level research” is too vague to be useful anyways. Is it equivalent to a preprint, a poster, a workshop paper, a conference paper, a journal submission, or a book? Is it expected to pass peer review in a prestigious venue, a mid-tier venue, or simply any venue at all?

There’s wildly varying levels of quality among these options, even though they could all reasonably be called “PhD-level research.”

I'm a professor who trains PhDs in cryptography, and I can say that it genuinely does have knowledge equivalent to a PhD student. Unfortunately I've never gotten it to produce a novel result. And occasionally it does frightening stuff, like swapping the + and * in a polynomial evaluation when I ask it to format a LaTeX algorithm.

Why, ask another deep research model to critique it of course! ;-)