I'm a diagnostic radiologist with 20 years clinical experience, and I have been programming computers since 1979. I need to challenge one of your core assumptions.
> Can AI read diagnostic images better than a radiologist? Almost certainly the answer is (or will be) yes.
I'm sorry, but I disagree, and I think you are making a wild assumption here. I am up to date on the latest AI products in radiology, use several of the, and none of them are even in the ballpark on this. That vast majority are non-contributory.
It is my strong belief that there is an almost infinite variation in both human anatomy and pathology. Given this variation, I believe that in order for your above assumption to be correct, the development of "AGI" will need to happen.
When I interpret a study I am not just matching patterns of pixels on the screen with my memory. I am thinking, puzzling, gathering and synthesizing new information. Every day I see something I have never seen before, and maybe no one has ever seen before. Things that can't and don't exist in a training data set.
I'm on the back end of my career now and I am financially secure. I mention that because people will assume I'm a greedy and ignorant Luddite doctor trying to protect my way of life. On the contrary, if someone developed a good replacement for what I don, I would gladly lay down my microphone and move on.
But I don't think we are there yet, in fact I don't think we're even close.
Can a human reliably carefully study for hours on end imaging from screening tests (think of a future world where whole-body MRI scanning for asymptomatic people becomes affordable and routine thanks to AI processing) and not miss subtle anomalies?
I can easily imagine that humans are better at really digging deeply and reasoning carefully about anomalies that they notice.
I doubt they're nearly as good as computers at detecting subtle changes on screens where 99% of images have nothing worrisome and the priors are "nothing is suspicious".
I don't want to equate radiologists with TSA screeners, but the false negative rate for TSA screening of carryon bags is incredibly high. I think there's an analog here about the ability of humans to maintain sustained focus on tedious tasks.
> Can a human reliably carefully study for hours on end imaging from screening tests
This is actually very common in radiology where some positions have shifts of 8-12 hours, where one isn't done until all the studies on the list have been read.
> think of a future world where whole-body MRI scanning for asymptomatic people becomes affordable and routine thanks to AI processing) and not miss subtle anomalies?
The bottleneck in MRI is not reading but instead the very long acquisition times paired with the unavailability of the expensive machinery.
If we charitably assume that you're thinking of CT scans, some studies on indiscriminate imaging indicate that most findings will be false positives:
https://pmc.ncbi.nlm.nih.gov/articles/PMC6850647/
Do any of these models know how to say "I don't know"? This is one of my biggest worries about these models.
> When I interpret a study I am not just matching patterns of pixels on the screen with my memory.
Seems like an over simplification, but let's say it's just true. Wouldn't you rather spend your time on novel problems that you haven't seen before? Some ML system identifies easy/common ones that it has high confidence in, leaving the interesting ones for you?
Yes, that would be ideal, if we could build such a system. I think we cannot with current tech.
Your belief is held by many, many radiologists. One thing I like to highlight is that LLMs and LVMs are much more advanced than any model in the past. In particular, they do not require specific training data to contain a diagnosis. They don't even require specific modality data to make inferences.
Think about how you learned anatomy. You probably looked at Netter drawings or Grey's long before you ever saw a CT or MRI. You probably knew the English word "laceration" before you saw a liver lac. You probably knew what a ground glass bathroom window looked like before the term was used to describe lung findings.
LLMs/LVMs ingest a huge amount of training data, more than humans can appreciate, and learn connections between that data. I can ask these models to render an elephant in outer space with a hematoma on its snout in the style of a CT scan. Surely, there is no such image in the training set, yet the model knows what I want from the enormous number of associations in its network.
Also, the word "finite" has a very specific definition in mathematics. It's a natural human fallacy to equate very large with infinite. And the variation in images is finite. Given a 16-bit, 512 x 512 x 100 slice CT scan, you're looking at 2^16 * 26214400 possible images. Very large, but still finite.
Of course, the reality is way, way smaller. As a human, you can't even look at the entire grayscale spectrum. We just say, < -500 Hounsfield units (HU), that's air, -200 < fat < 0, bone/metal > 100, etc. A gifted radiologist can maybe distinguish 100 different tissue types based on the HU. So, instead of 2^16 pixel values, you have...100. That's 100 * 26214400 = 262,440,000 possible CT scans. That's a realistic upper-limit on how many different CT scans there could possibly be. So, let's pre-draft 260 million reports and just pick the one that fits best at inference time. The amount you'd have to change would be miniscule.
Maybe I’m misunderstanding what you’re calculating, but this math seems wildly off. Sincerely don’t understand an alternate numerical point being made.
> Given a 16-bit, 512 x 512 x 100 slice CT scan, you're looking at 2^16 * 26214400
65536^(512*512) or 65536 multiplied by itself 262144 times for each image. An enormous number. Whether or not assume replacement (duplicates) is moot.
> That's 100 * 26214400 = 262,440,000
There are 100^(512*512) 512x512 100-level grayscale images alone or 100 to the 262144 power - 100 multiplied 262144 times. Again how you paring down a massive combinatoric space to a reasonable 262 mil?
Hi aabajian, thanks for replying!
I might quibble with your math a little. Most CTs have more than 100 images, in fact as you know stroke protocols have thousands. And many scans are reconstructed with different kernels, i.e. soft tissue, bone, lung. So maybe your number is a little low.
Still your point is a good one, that there is probably a finite number of imaging presentations possible. Let's pre-dictate them all! That's a lot of RVUs, where do I sign up ;-)
Now, consider this point. Two identical scans can have different "correct" interpretations.
How is that possible? To simplify things, consider an x-ray of a pediatric wrist. Is it fractured? Well, that depends. Where does it hurt? How old are they? What happened? What does the other wrist look like? Where did they grow up?
This may seems like an artificial example but I promise you it is not. There can be identical x-rays, and one is fractured and one is not.
So add this example to the training data set. Now do this for hundreds or thousands of other "corner cases". Does that head CT show acute blood, or is that just a small focus of gyriform dystrophic calcification? Etc.
I guess my point it, you may end up being right. But I don't think we are particularly close, and LLMs might not get us there.
Haha, I’m also an IR with AI research experience.
My view is much more in line with yours and this interpretation.
Another point - I think many people (including other clinicians) have a sense that radiology is a practice of clear cut findings and descriptions, when in practice it’s anything but.
At another level beyond the imaging appearance and clinical interpretation is the fact that our reports are also interpreted at a professional and “political” level.
I can imagine a busy neurosurgeon running a good practice calling the hospital CEO to discuss unforgiving interpretations of post op scans from the AI bot……
> I can imagine a busy neurosurgeon running a good practice calling the hospital CEO to discuss unforgiving interpretations of post op scans from the AI bot……
I have fielded these phone calls, lol, and would absolutely love to see ChatGPT handle this.
Johns Hopkins has an in house AI unit where they train their own AI's to do imaging analysis. In fact this center made the rounds a few months ago in an NYT story about AI in radiology.
What was left out was that these "cutting edge" AI imaging models were old school CNNs from the mid 2010's, running on local computers. It seems only right now is the idea of using transformers (what LLMs are) is being explored.
In that sense, we still do not know what a purpose build "ChatGPT of radiology" would be capable of, but if we use the data point of comparing AI from 2015 to AI of 2025, the step up in ability is enormous.
AI can detect a Black person vs a White person via their chest x-rays. Radiologists say there is no difference. Turns out they're wrong. https://www.nibib.nih.gov/news-events/newsroom/study-finds-a...
That being said, there are no radiologists available to hire at any price: https://x.com/ScottTruhlar/status/1951370887577706915
THERE ARE NO RADIOLOGISTS AVAILABLE TO HIRE AT ANY PRICE!!!
True, and very frustrating. Imaging volume is going parabolic and we cannot keep up! I am offering full partnership on day one with no buy-in for new hires. My group is in the top 1% of radiology income. I can't find anyone to hire, I can only steal people from other groups.
"Latest products" and "state of the art" are two very, very different classes of systems. If anything medical has reached the state of a "product", you can safely assume that it's somewhere between 5 and 50 years behind what's being attempted in the labs.
And in AI tech, even "5 years ago" is a different era.
In year 2025, we have those massive multimodal reasoning LLMs that can crossreference data from different images, text and more. If the kind of effort and expertise that went into general purpose GPT-5 went into a more specialized medical AI, where would its capabilities top out?
> Every day I see something I have never seen before, and maybe no one has ever seen before.
Do you have any typical examples of this you could try to explain to us laymen, so we get a feel for what this looks like? I feel like it's hard for laymen to imagine how you could be seeing new things outside a pattern every day (or week}.