Your belief is held by many, many radiologists. One thing I like to highlight is that LLMs and LVMs are much more advanced than any model in the past. In particular, they do not require specific training data to contain a diagnosis. They don't even require specific modality data to make inferences.
Think about how you learned anatomy. You probably looked at Netter drawings or Grey's long before you ever saw a CT or MRI. You probably knew the English word "laceration" before you saw a liver lac. You probably knew what a ground glass bathroom window looked like before the term was used to describe lung findings.
LLMs/LVMs ingest a huge amount of training data, more than humans can appreciate, and learn connections between that data. I can ask these models to render an elephant in outer space with a hematoma on its snout in the style of a CT scan. Surely, there is no such image in the training set, yet the model knows what I want from the enormous number of associations in its network.
Also, the word "finite" has a very specific definition in mathematics. It's a natural human fallacy to equate very large with infinite. And the variation in images is finite. Given a 16-bit, 512 x 512 x 100 slice CT scan, you're looking at 2^16 * 26214400 possible images. Very large, but still finite.
Of course, the reality is way, way smaller. As a human, you can't even look at the entire grayscale spectrum. We just say, < -500 Hounsfield units (HU), that's air, -200 < fat < 0, bone/metal > 100, etc. A gifted radiologist can maybe distinguish 100 different tissue types based on the HU. So, instead of 2^16 pixel values, you have...100. That's 100 * 26214400 = 262,440,000 possible CT scans. That's a realistic upper-limit on how many different CT scans there could possibly be. So, let's pre-draft 260 million reports and just pick the one that fits best at inference time. The amount you'd have to change would be miniscule.
Maybe I’m misunderstanding what you’re calculating, but this math seems wildly off. Sincerely don’t understand an alternate numerical point being made.
> Given a 16-bit, 512 x 512 x 100 slice CT scan, you're looking at 2^16 * 26214400
65536^(512*512) or 65536 multiplied by itself 262144 times for each image. An enormous number. Whether or not assume replacement (duplicates) is moot.
> That's 100 * 26214400 = 262,440,000
There are 100^(512*512) 512x512 100-level grayscale images alone or 100 to the 262144 power - 100 multiplied 262144 times. Again how you paring down a massive combinatoric space to a reasonable 262 mil?
Hi aabajian, thanks for replying!
I might quibble with your math a little. Most CTs have more than 100 images, in fact as you know stroke protocols have thousands. And many scans are reconstructed with different kernels, i.e. soft tissue, bone, lung. So maybe your number is a little low.
Still your point is a good one, that there is probably a finite number of imaging presentations possible. Let's pre-dictate them all! That's a lot of RVUs, where do I sign up ;-)
Now, consider this point. Two identical scans can have different "correct" interpretations.
How is that possible? To simplify things, consider an x-ray of a pediatric wrist. Is it fractured? Well, that depends. Where does it hurt? How old are they? What happened? What does the other wrist look like? Where did they grow up?
This may seems like an artificial example but I promise you it is not. There can be identical x-rays, and one is fractured and one is not.
So add this example to the training data set. Now do this for hundreds or thousands of other "corner cases". Does that head CT show acute blood, or is that just a small focus of gyriform dystrophic calcification? Etc.
I guess my point it, you may end up being right. But I don't think we are particularly close, and LLMs might not get us there.
Haha, I’m also an IR with AI research experience.
My view is much more in line with yours and this interpretation.
Another point - I think many people (including other clinicians) have a sense that radiology is a practice of clear cut findings and descriptions, when in practice it’s anything but.
At another level beyond the imaging appearance and clinical interpretation is the fact that our reports are also interpreted at a professional and “political” level.
I can imagine a busy neurosurgeon running a good practice calling the hospital CEO to discuss unforgiving interpretations of post op scans from the AI bot……
> I can imagine a busy neurosurgeon running a good practice calling the hospital CEO to discuss unforgiving interpretations of post op scans from the AI bot……
I have fielded these phone calls, lol, and would absolutely love to see ChatGPT handle this.