I wonder if one could store only the binary representation at training and sample a floating point representation (both weights and gradient) during backprop.

Back propagation on random data that is then thrown away would be pretty useless.