Because quicksort is like a worst case scenario for branch prediction.
Branch prediction works when the CPU can detect some kind of pattern in the program behavior, like if the branch usually goes one way and rarely goes the other way. If it predicts correctly, the CPU pipeline keeps going without stalling. If it predicts incorrectly, the CPU wastes time doing work that it has to throw away. Therefore, being able to predict correctly is essential.
In quicksort, you partition an array into two halves based on whether each element is smaller or larger than the pivot value. The key to good quicksort performance is to choose a good pivot which divides the values in half about equally. This means (assuming you're sorting randomly-ordered data) that the branching behavior will be totally unpredictable, like flipping a coin. So branch prediction will be basically useless.
Copying small chunks of data is OK if it all fits in cache, especially if it all fits in L1 cache. It's not ideal to copy data unnecessarily, but if it allows you to keep the CPU pipeline running at full speed with no stalls, it can be a good trade-off.
In modern CPUs a mispredicted branch is much more expensive than a memory write.
The unsaid assumption is that the array is filled with random values between 0 and 1000, so the "if" condition has a 50% of chances to be true. The branch is mispredicted 50% of times.
Of course this trick won't work when the statement protected by "if" is a more complex and costly action, or one that can't be undone (in the example, note that when the counter is not incremented, the value written to memory will be overwritten in the next cycle, so it's "undone" in a certain way).
> In modern CPUs a mispredicted branch is much more expensive than a memory write.
Mostly because of caching. The writes either go to the same address as a previous one or move only a small increment, so most writes are likely going to hit L1 cache. If it wrote to a random memory location after every iteration the cost of a misprediction would probably disappear in the noise.
Modern processors are pipelined, where they run a lot in advance of when the result is needed. A mispredicted branch requires throwing out all the advance calculations on the incorrectly followed path. The branch predictor can't predict branches like this based on data that tends to equally be distributed for taken and not taken. The memory write is fine because it's not conditional, so it can be pipelined along with everything else.
Because quicksort is like a worst case scenario for branch prediction.
Branch prediction works when the CPU can detect some kind of pattern in the program behavior, like if the branch usually goes one way and rarely goes the other way. If it predicts correctly, the CPU pipeline keeps going without stalling. If it predicts incorrectly, the CPU wastes time doing work that it has to throw away. Therefore, being able to predict correctly is essential.
In quicksort, you partition an array into two halves based on whether each element is smaller or larger than the pivot value. The key to good quicksort performance is to choose a good pivot which divides the values in half about equally. This means (assuming you're sorting randomly-ordered data) that the branching behavior will be totally unpredictable, like flipping a coin. So branch prediction will be basically useless.
Copying small chunks of data is OK if it all fits in cache, especially if it all fits in L1 cache. It's not ideal to copy data unnecessarily, but if it allows you to keep the CPU pipeline running at full speed with no stalls, it can be a good trade-off.
In modern CPUs a mispredicted branch is much more expensive than a memory write.
The unsaid assumption is that the array is filled with random values between 0 and 1000, so the "if" condition has a 50% of chances to be true. The branch is mispredicted 50% of times.
Of course this trick won't work when the statement protected by "if" is a more complex and costly action, or one that can't be undone (in the example, note that when the counter is not incremented, the value written to memory will be overwritten in the next cycle, so it's "undone" in a certain way).
> In modern CPUs a mispredicted branch is much more expensive than a memory write.
Mostly because of caching. The writes either go to the same address as a previous one or move only a small increment, so most writes are likely going to hit L1 cache. If it wrote to a random memory location after every iteration the cost of a misprediction would probably disappear in the noise.
Modern processors are pipelined, where they run a lot in advance of when the result is needed. A mispredicted branch requires throwing out all the advance calculations on the incorrectly followed path. The branch predictor can't predict branches like this based on data that tends to equally be distributed for taken and not taken. The memory write is fine because it's not conditional, so it can be pipelined along with everything else.
[dead]