Hacker News

> NNs were largely dismissed

I agree with your larger point but dismissed is rather too strong. They were considered fiddly to train, prone to local minima, long training time, no clear guidelines about what the number of hidden layers and number of nodes ought to be. But for homework (toy) exercises they were still ok.

In comparison, kernel methods gave a better experience over all for large but not super large data sets. Most models had easily obtainable global minimum. Fewer moving parts and very good performance.

It turns out, however, that if you have several orders of magnitude more data, the usual kernels are too simple -- (i) they cannot take advantage of more data after a point and start twiddling the 10th place of decimal of some parameters and (ii) are expensive to train for very large data sets. So bit of a double whammy. Well, there was a third, no hardware acceleration that can compare with GPUs.

Kernels may make a comeback though, you never know. We need to find a way to compose kernels in a user friendly way to increase their modeling capacity. We had a few ways of doing just that but they weren't great. We need a breakthrough to scale them to GPT sized data sets.

In a way DNNs are "design your own kernels using data" whereas kernels came in any color you liked provided it was black (yes there were many types, but it was still a fairly limited catalogue. The killer was that there was no good way of composing them to increase modeling capacity that yielded efficiently trainable kernel machines)