I made an explicit reverse pass (no autodiff), it was 8x faster in Python

Can you share a link?