Hacker News

deepsun 3 days ago [ - ]

Same thing with Computer Vision, as Andrew Ng pointed out, the main thing that enabled the rapid progress was not new models, but mostly due to large _labeled_ datasets, particularly ImageNet.

Nevermark 3 days ago [ - ]

Yes larger usable datasets, paired with an acceleration of mainstream parallel computing power (GPUs), with increasing algorithm flexibility (CUDA).

Without all three, progress would have been much slower.

highfrequency 3 days ago [ - ]

Do you have a link handy for where he says this explicitly?

noboostforyou 2 days ago [ - ]

Here's any older interview where he talks about the need for accurate dataset labeling -

"In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn."

https://spectrum.ieee.org/andrew-ng-data-centric-ai