Imagenet is one of the most popular datasets on the planet. Turns out, a significant fraction of its images are mislabeled. In the limit case the model would have to fit towards wrong answers to get higher than a certain percentage.
The answer is “it works because ML wants to work.” It’s surprising how far you can get with something flawed. It’s also why such huge breakthroughs are possible by noting flaws others haven’t.
> It’s also why such huge breakthroughs are possible by noting flaws others haven’t.
I do these sort of breakthroughs at home all the time! My wife would say the computer is doing something strange, and instead of just randomly clicking around, I read the error messages slowly and out loud, then follow what they say. Anyone can do this, yet it seems like a magical ability every time you employ it to help people.
[dead]
Has it been reasonably possible to overfit to the errors in ImageNet, or are they effectively random noise?