I've always found it rather crazy that the power of backpropagation and artificial neural networks was doubted by AI researchers for so long. It's really only since the early 2010s that researchers started to take the field seriously. This is despite the core algorithm (backpropagation) being known for decades.
I remember when I learnt about artificial neural networks at university in the late 00s my professors were really sceptical of them, rightly explaining that they become hard to train as you added more hidden layers.
See, what makes backpropagation and artificial neural networks work are all of the small optimisations and algorithm improvements that were added on top of backpropagation. Without these improvements it's too computationally inefficient to be practical and you have to contend with issues like exploding gradients.
I think Geoffrey Hinton has noted a few times that for people like him who have been working on artificial neural networks for years it's quite surprising that today neural networks just work because for years it was so hard to get them to do anything. In this sense while backpropagation is the foundational algorithm, it's not sufficient on it's own. It was the many improvements that were made on top of backpropagation that actually make artificial neural networks work and take off in the 2010s when some of the core components of modern neural networks started to fall into place.
I remember when I first learnt about neural networks I thought maybe coupling them with some kind of evolutionary approach might be what was needed to make them work. I had absolutely no idea what I was doing of course, but I spent so many nights experimenting with neural networks. I just loved the idea of an artificial "neural network" being able to learn a new problem and spit out an answer. The biggest regret of my life was coming out of university and going into web development because there were basically no AI jobs back then, and no such thing as an AI startup. If you wanted to do AI back then you basically had to be a researcher which didn't interest me at the time.
> I remember when I first learnt about neural networks I thought maybe coupling them with some kind of evolutionary approach might be what was needed to make them work.
I did this in an artificial life simulation. It was pretty fun to see the creatures change from pure random bouncing around to movement that helped them get food and move away from something eating them.
My naive vision was all kinds of advanced movement, like hiding around corners for prey, but it never got close to something like that.
As I worked the evolutionary parameters I began to realize more and more that the process of evolving specific advanced traits requires lots of time and (I think) environmental complexity and compartmentalization of groups of creatures.
There are lots of simple/dumb capabilities that help with survival and they are much much easier to acquire than a more advanced capability like being aware of other creatures and tracking it's movement on the other side of an obstacle.
Apart from backpropagation, the biggest improvements were probably changes in the network architecture. Standard feed-forward MPCs are fairly inefficient. Then there were architectures like CNNs, LSTMs, Transformers. There were also improvements in the activation function and in the gradient descent method (AdamW), but I'm not sure whether these had a substantial impact like CNNs or Transformers. Another factor was training on GPUs.