>could be interpreted as a binary which could be converted straight into an ASCII character for comparison and backpropagation.

There's nothing to backpropagate with a step function. The derivative is zero everywhere.

It sounds like jongjong was probably using surrogate gradients. You keep the step activation in the forward pass but replace with a smooth approximation in the backwards pass.

Yeah, but then there is no performance benefit over plain old sgd.

Yeah, I think surrogate gradients are usually used to train spiking neural nets where the binary nature is considered an end in itself, for reasons of biological plausibility or something. Not for any performance benefits. It's not an area I really know that much about though.

There's performance benefits when they're implemented in hardware. The brain is a mixed-signal system whose massively-parallel, tiny, analog components keep it ultra-fast at ultra-low energy.

Analog NN's, including spiking ones, share some of those properties. Several chips, like TrueNorth, are designed to take advantage of that on biological side. Others, like Mythic AI's, are accelerating normal types of ML systems.

[deleted]

I can't remember the name of the algorithm we used. It wasn't doing gradient descent but it was a similar principle; basically adjust the weights up or down by some fixed amount proportional to their contribution to the error. It was much simpler than calculating gradients but it still gave pretty good results for single-character recognition.