Hacker News

Yeah fair enough. The exponent of an FP32 has only 8 bits instead of 11 bits. I'll make an edit to make this explicit.

It's also fairly interesting how Nvidia handles this for the Ozaki scheme: https://docs.nvidia.com/cuda/cublas/#floating-point-emulatio.... They generally need to align all numbers in a matrix row to the maximum exponent (of a number in the row) but depending on scale difference of two numbers this might not be feasible without extending the number of mantissa bits significantly. So they dynamically (Dynamic Mantissa Control) decide if they use Ozaki's scheme or execute on native FP64 hardware. Or they let the user decide on the number of mantissa bits (Fixed Mantissa Control) which is faster but has no longer the guarantees for FP64 precision.