It's 3 cycles for float multiplication (and 1 for shift right):

3x faster

In throughput it's even less of a difference: 2 per cycle vs 3 per cycle.

50% faster