An integer add, sub, or shift is 1 cycle of latency; integer mul is generally 3 cycles; integer div is lol-that-is-slow. Floating-point adds, subs, muls, and fmas are generally 4 cycles, with div being lol-that-is-slow (but generally faster than integer division because your divisor and dividend have fewer bits).

So fixed-point addition and subtraction are definitely faster, multiplication is a wash if you're doing binary-based fixed point (but slower if you're doing decimal-based fixed point), and fixed-point division is definitely slower than floating-point division.

Kudos on providing the numbers. I wasn't confident in the numbers that I remembered and with how pipelined everything is, I didn't know how much to lean on them. Not exactly my standard workflow to care about this level of speed.

My gut would still be that it is typically a wash for most everyone as far as speeds go? If default libraries supported it more directly, I would think it would largely be a win for a lot of reasoning. In particular, silly stuff like 1e32 + 1e1 would not be nearly as surprising to most people. And the entire class of bugs around stuff like doing something until it reaches 0.9 would almost certainly go away if we guaranteed precision to a set number decimal places.

Alas, default libraries do not support this, though. So the above is admittedly wishful thinking on my end. And I could as easily describe a world where people insist on arbitrary numbers of fixed point values and how that would be its own set of landmines.