It's 3 cycles for float multiplication (and 1 for shift right):
https://uops.info/table.html?search=mulss&cb_lat=on&cb_tp=on...
https://uops.info/table.html?search=shr&cb_lat=on&cb_tp=on&c...
In throughput it's even less of a difference: 2 per cycle vs 3 per cycle.
It's 3 cycles for float multiplication (and 1 for shift right):
3x faster
In throughput it's even less of a difference: 2 per cycle vs 3 per cycle.
50% faster