What's the point of the relu in the loss function? Its inputs are nonnegative anyway.

Let's try to keep things positive.

I wondered the same. Seems like it would just make a V-shaped loss around the zero, but abs has that property already!

RELU would have made it flat below zero ( _/ not \/). Adding the abs first just makes RELU do nothing.

In reality it’s probably not a RELU modern LLMs use GeLU or something more advanced.

Sometimes a cosmic ray might hit the sign bit of the register and flip it to a negative value. So it is useful to pass it through a rectifier to ensure it's never negative, even in this rare case.

Indeed, we should call all idempotent functions twice just in case the first incantation fails to succeed.

In all seriousness, this is not at all how resilience to cosmic interference works in practice, and the probability of any executed instruction or even any other bit being flipped is far greater than the one specific bit you are addressing.

I thought the belt and braces approach was a valuable contribution to AI safety. Better safe than sorry with these troublesome negative numbers!

Well, I guess it's helping to distinguish authors who are doing arithmetic they understand from ones who are copying received incantations around...