I was wondering how well this would work :) You can definitely push this further, the question is: how well can the gradients and updates compress?
I was wondering how well this would work :) You can definitely push this further, the question is: how well can the gradients and updates compress?