Surely the roots, if we skip over the early preceptron work', are in backpropagation and Hinton, and the work going on at Edinburgh and elsewhere in the 80s.
Indeed I remember buying a set of three conference-papers-as-books around that time, titled Artificial Neural Networks .. proceedings of the whatever the conference was.
No doubt Schmidhuber made important contributions, but I see him pop up claiming to be the 'root' of it all every couple of years.
Hinton did not invent backpropagation.
related paragraph from Wikipedia:
Modern backpropagation was first published by Seppo Linnainmaa as "reverse mode of automatic differentiation" (1970)[26] for discrete connected networks of nested differentiable functions.[27][28][29]
In 1982, Paul Werbos applied backpropagation to MLPs in the way that has become standard.
Paul Werbos did not apply backprop to MLPs as cleanly described in Hinton's paper, but rather to some kind of autoregressive non-linear parametrized functions with a much more specific application scope.
Both papers are direct applications of the chain rule applied to estimate the gradient of a multivariate function.
That's what bugs me about him. So much work has gone into today's models that calling his contributions "the root" isn't really warranted. He's always complaining that Hinton, LeCun, and Bengio get more credit than they deserve, and now he's over-claiming himself.
Both can be right.
They could be, but they really aren't.
Name a single aspect of something modern like the Transformer architecture or how it is trained, that is even indirectly attributable to Schmidhuber.
No doubt he'd be jumping up and down wanting to take credit for residual connections, but where was Schmidhuber in the ImageNet era when everyone else was discovering how to build deep neural nets? Why didn't Schmidhuber invent ResNets, but instead waited until someone else (Kaiming He) did, then claim credit for it?
I'll bet Schmidhuber isn't done with yet ... when someone eventually comes up with an architecture for AGI, Schmidhuber will come out of the woodwork and point to a note he made on a napkin in 1800 that predicted it all.
Surely the roots go back to Turing, Gödel, Hilbert, Frege, Leibniz, Aristoteles.