I went in expecting to find 'branch prediction'[0] as the answer, but apparently things are even more complex nowadays.

[0] - https://stackoverflow.com/questions/11227809/why-is-conditio...

>I went in expecting to find 'branch prediction'[0]

GPUs do branch prediction? I thought they didn't bother and try to minimize wasted effort by using high amounts of concurrent threads?

They do texture prefetching, which is sorta similar.

To be fair, the culprit in the article is _less complex_ than branch prediction: "with random data, bits are flipped often, and bit flips in transistors inherently draw power" is less mental gymnastics than "with random data, the cpu fails to predict the future, causing redundant speculative execution"

But why do we expect random data to result in more bit flips? That seems harder to argue than the mechanics of a basic branch prediction system.

I expected a “torch is smart enough to keep track of cases where it just initialized the C in C <= A*B+C to zero, avoiding the add” type situation but I was wrong.

That's exactly what I thought.