This reminds me if my university days. For one of the assignments, we had to write our own ANN from scratch for handwriting recognition and we implemented a step activation function because that was easier than sigmoid; basically each layer would output one or zero though I guess the weights themselves were scalars. It's just the node outputs which were 1 or 0... But this was convenient because the output of the final layer could be interpreted as a binary which could be converted straight into an ASCII character for comparison and backpropagation.

>could be interpreted as a binary which could be converted straight into an ASCII character for comparison and backpropagation.

There's nothing to backpropagate with a step function. The derivative is zero everywhere.

It sounds like jongjong was probably using surrogate gradients. You keep the step activation in the forward pass but replace with a smooth approximation in the backwards pass.

Yeah, but then there is no performance benefit over plain old sgd.

Yeah, I think surrogate gradients are usually used to train spiking neural nets where the binary nature is considered an end in itself, for reasons of biological plausibility or something. Not for any performance benefits. It's not an area I really know that much about though.

There's performance benefits when they're implemented in hardware. The brain is a mixed-signal system whose massively-parallel, tiny, analog components keep it ultra-fast at ultra-low energy.

Analog NN's, including spiking ones, share some of those properties. Several chips, like TrueNorth, are designed to take advantage of that on biological side. Others, like Mythic AI's, are accelerating normal types of ML systems.

[deleted]

I can't remember the name of the algorithm we used. It wasn't doing gradient descent but it was a similar principle; basically adjust the weights up or down by some fixed amount proportional to their contribution to the error. It was much simpler than calculating gradients but it still gave pretty good results for single-character recognition.