Hacker News

Yes, that's been the downside of these forever.

If you use quantized differentiation you can get away with using integers for gradient updates. Explaining how takes a paper and in the end it doesn't even work very well.

At university, way back at the end of the last AI winter, I ended up using genetic algorithms to train the models. It was very interesting because weights were trained along with hyper parameters. It was no where near practical because gradient descent is so much better at getting real world results in reasonable time frames - surprisingly because it's more memory efficient.