I did some manual golfing with nand2tetris assembly and developed similar hacks to the max() implementation, where one appropriates an arbitrary, conveniently placed, memory address.

After reading the article, though, I feel like I definitely need a superoptimiser, to see what could be improved :)