I agree, but there’s a tiny caveat that this is for one specific benchmark that, I think, doesn’t reflect most real-world code.

I’m basing that on the 1.6% improvement they got on speeding up sqrt. That surprised me, because, to get such an improvement, the benchmark must spend over 1.6% of its time in there, to start with.

Looking in the git repo, it seems that did happen in the nbody simulation (https://github.com/pizlonator/zef/blob/master/ScriptBench/nb...).