Any reason for creating a new tensor when accumulating grads over updating the existing one?
Edit: I asked this before I read the design decisions. Reasoning is, as far as I understand, that for simplificity no in-place operations hence accumulating it done on a new tensor.
yeah, exactly. it's for explicit ownership transfer. you always own what you receive, sum it, release both inputs, done. no mutation tracking, no aliasing concerns.
https://github.com/sueszli/autograd.c/blob/main/src/autograd...
i wonder whether there is a more clever way to do this without sacrificing simplicity.