As it is stated, I always thought it came from formulations like Euler-Lagrange procedures in mechanics used in numeric methods for differential geometry. In fact when I recreated the algorithm as an exercise it immediately reminded me of gradient descent for kinematics, with the Jacobian calculation for each layer similar to an iterative pose calculation in generalized coordinates. I never thought it was something "novel".