Hacker News

    for (int i = 0; i < 1000; i++) {
        small_numbers[smlen] = numbers[i];
        smlen += (numbers[i] < 500);
    }

is much faster than the conventional version with a conditional branch:

    for (int i = 0; i < 1000; i++) {
        if (numbers[i] < 500) {
            small_numbers[smlen] = numbers[i];
            smlen += 1;
        }
    }

Been staring at this for a bit, but my brain is not working properly today: could someone please explain how these to loops compute the same value for small_numbers[smlen]?

edelind 12 hours ago [ - ]

Here is another perspective:

- the first one (branchless) use the condition to SAVE the correct value (< 500): it temporarily writes any current value to the same index i, always overwriting the previous value, effectively saving it (by moving forward to i+1) only when the value is right (small number). Downside of this simple function: the last value may be bigger than 500

- the second one use the condition to ADD the value, when it is 100% sure it is a correct small number

bhaak 12 hours ago [ - ]

They don't. After running, for the values in small_numbers from 0 to smlen-1 they are equivalent.

But if the last value of numbers[] is not smaller than 500, small_numbers[smlen] will contain that value for the first version whereas the second version does not write to small_numbers[smlen].

addaon 14 hours ago [ - ]

> "these two loops compute the same value"

At what sequence point? The branchless version writes to small_numbers[smlen], for any given value of smlen, potentially more than once; so there are observable points of time during the loop where the behavior is different. But after the loop, both contain the final write to small_numbers[i] for all 0 <= I < smlen; and the transient writes both don't change observed external behavior, and are apparently cheaper than fewer but conditional writes.

kleiba2 8 hours ago [ - ]

I think the small_numbers array would differ after the end of the loop if, for instance, numbers contained only numbers >= 500. Am I wrong?

Ukv 8 hours ago [ - ]

smlen would be 0 for both if there are no small numbers, so end result of both is an empty array.

For the first version small_numbers[0] will contain an arbitrary value at the end, and for the second version it happens to contain the last number read, but that address is outside of the 0-length array being returned.

davrosthedalek 8 hours ago [ - ]

The point is that you should look only at the first smlen entries, which would be 0 for this case.

teo_zero 14 hours ago [ - ]

Writing to array[n] and not incrementing n means that the value just written is outside the "useful" range (from 0 to n-1) and will not be considered (it will be overwritten the next iteration).

zelphirkalt 13 hours ago [ - ]

I am rather thinking, if one is so much faster, and they are truly equal, why is the compiler too stupid to convert one into the other?

br121 11 hours ago [ - ]

It doesn't convert bogosort into heapsort either, despite the second being much faster than the first. I'm guessing that it's not that easy going from one to the other because the only thing they have in common is the output (and only after you have checked the last value), so if the transformation is not hard-coded into the compiler, the odds of it randomly discovering the optimization is close to zero

zelphirkalt 11 hours ago [ - ]

Yeah, I would expect such transformations to be implemented as optimizations. Just like maybe (the admitedly simpler):

    (+ ((lambda () 1)) ((lambda () 1))) -> (+ 1 1)

A syntactical transformation, where it is possible as an equivalent transformation.

I may be overlooking special cases, but I thought the compiler is smart enough to infer that the array elements are integers and that `<` will result in a boolean, which is just `0` and `1` and will understand that having only the `if` without `else` branch is equivalent in this case. Guess I was wrong and the compiler is not sophisticated in this specific way.

flohofwoe 12 hours ago [ - ]

The two code snippets do different things, apples and oranges... e.g. the array modification in the second example needs to move in front of the if for the two snippets to behave identically. I bet then the compiler output is the same with -O1 or higher.

PS: e.g. note how bla() (first code snippet) and blob() (fixed second code snippet) have identical output (both are turned into the same 'branchless' code via a conditional 'setl' instruction), but the blub() function (original second code snippet) differs because that function has different behaviour:

https://www.godbolt.org/z/h9Kfbn5bc

TL;DR: most 'branchless advice' that only tinkers with language features (like "x = a ? b : c" instead of an if) is useless because to the optimizer passes both are the same thing (a condition).

When there's a difference in the generated code then it's usually a bug and the before-after code are not actually equivalent (like in the code examples above).

12 hours ago [ - ]

[deleted]

gblargg 14 hours ago [ - ]

It only increments if the number was less than 500, effectively just saving the ones less than 500.

jcul 13 hours ago [ - ]

First version has a side effect of writing to small_numbers[0] always.

The compiler probability can't optimize that in the second version.

If it wrote unconditionally and incremented only in the if then I'd guess they would compile to the same thing.

defrost 14 hours ago [ - ]

numbers[i] < 500

is a conditional (true or false) that evaluates to 1 or 0 (in C)

Therefore smlen has either a 0 or a 1 added to it's value .. equivilent to only adding 1 if True.

10 hours ago [ - ]

[deleted]

11 hours ago [ - ]

[deleted]