NetBurst was supposed to be the application of RISC principles to x86 taken to its extreme (ultra-long pipelines to reduce clock-to-clock delay, highest clock speed possible --- basically reducing work-per-clock and hoping that reduces complexity enough to increase clock speed to compensate.) The ALU was 16 bits, "double pumped" with the carry split between the two, which lead to 32-bit ALU operations that don't carry between the lower and upper halves actually finishing a clock cycle faster than those with a carry.

https://stackoverflow.com/questions/45066299/was-there-a-p4-...