Hacker News

Today You Learn that the same Shun'ichi Amari who founded information geometry also made early advances to autodifferentiation.

Iterating gradient descent is much, much older, and immediately recognized upon defining what a gradient is (regardless of how one computes this gradient).

AD vs { symbolic differentiation, numeric finite "differentiation" } is about the insight how to compute a numeric gradient efficiently both in terms of (space) memory and time (compute) requirements.