Cool, although more ILP (instruction-level parallelism) might not necessarily be better on a modern GPU, which doesn't have much ILP, if any (instead it uses those resources to execute several threads in parallel).
That might explain why the original Cg (a GPU programming language) code did not use Estrin's, since at least the code in the post does add 1 extra op (squaring `abs_x`).
(AMD GPUs used to use VLIW (very long instruction word) which is "static" ILP).