> In modern CPUs a mispredicted branch is much more expensive than a memory write.

Mostly because of caching. The writes either go to the same address as a previous one or move only a small increment, so most writes are likely going to hit L1 cache. If it wrote to a random memory location after every iteration the cost of a misprediction would probably disappear in the noise.