When it weren't for the font it might be anomalies in the image taking or even in the encoder software. You can never really be sure, what exactly the ML is detecting.

Exactly. A marginally higher image ISO at one location vs a lower ISO at another could potentially have a similar effect, and it would be quite difficult to detect.

You can give it the same tests the human radiologists take in school.

They do take tests, don't they?

They don't all score 100% every time, do they?

The point here is that the radiologists has a concept of knowing which light patterns are sensible to draw conclusions from and which not, because the radiologist has a concept of real world 3D objects.

Sure. It's just not a valid point. Even if it's valid today, it won't be by next week.

Why not? That's what Grad-CAM is for right?

What if the ML takes the conclusion exactly from the right pixels, but the cause is a rasterization issue.