Hacker News

Reverse move automatic differentiation is not integration. It's still differentiation, but just a different method of calculating the derivative than the one you'd think to do by hand. It basically just applies the chain rule in the opposite order from what is intuitive to people.

It has a lot more overhead than regular forwards mode autodiff because you need to cache values from running the function and refer back to them in reverse order, but the advantage is that for function with many many inputs and very few outputs (i.e. the classic example is calculating the gradient of a scalar function in a high dimensional space like for gradient descent), it is algorithmically more efficient and requires only one pass through the primal function.

On the other hand, traditional forwards mode derivatives are most efficient for functions with very few inputs, but many outputs. It's essentially a duality relationship.

stephencanon 3 days ago [ - ]

I don't think most people think to do either direction by hand; it's all just matrix multiplication, you can multiply them in whatever order makes it easier.

eigenspace 3 days ago [ - ]

Im just talking about the general algorithm to write down the derivative of `f(g(h(x)))` using the chain rule.

For vector valued functions, the naive way you would learn in a vector calculus class corresponds to forward mode AD.