Check out cpp at 208.3 GiB/s, 3x faster than asm.

Yeah, because (and here's the trick) they are clever and do less work.

Optimizing things usually means "think of a way to do the same thing with less effort".

Hire the laziest programmer :)