I don't know if there is a particular paper exactly, but Ben Recht has a discussion of the relationship between techniques in optimal control that became prominent in the 60's, and backpropagation:

https://archives.argmin.net/2016/05/18/mates-of-costate/