I don't think most people think to do either direction by hand; it's all just matrix multiplication, you can multiply them in whatever order makes it easier.
I don't think most people think to do either direction by hand; it's all just matrix multiplication, you can multiply them in whatever order makes it easier.
Im just talking about the general algorithm to write down the derivative of `f(g(h(x)))` using the chain rule.
For vector valued functions, the naive way you would learn in a vector calculus class corresponds to forward mode AD.