An aside, but I recently learned -- if one is willing to use a very modest amount of memory -- summing floating-point numbers with no loss of precision is effectively a solved problem with the XSUM algorithm.
An aside, but I recently learned -- if one is willing to use a very modest amount of memory -- summing floating-point numbers with no loss of precision is effectively a solved problem with the XSUM algorithm.
That paper explains some useful optimisation details, but obviously since the floats are all (either infinity or) some multiple of a known tiny fraction (their smallest non-zero number), we can definitely sum them accurately.
Not if the ratio between the largest and smallest floats is very large (2^(2^n)) where n is the number of bits in the exponent.
I think you either haven't thought about this or you did your math wrong.
You need (2^e)+m+1 bits. That is more bits than would fit in the cheap machine integer type you just have lying around, but it's not that many in real terms.
Let's do a tiny one to see though first, the "half-precision" or f16 type, 5 bits of exponent, 10 bits of fraction, 1 sign bit. We need 43 bits. This will actually fit in the 64-bit signed integer type on a modern CPU.
Now lets try f64, the big daddy, 11 exponent, 52 fraction, 1 sign bit so total 2048 + 52 + 1 = 2101 bits. As I said it doesn't fit in our machine integer types but it's much smaller than a kilobyte of RAM.
Edited: I can't count, though it doesn't make a huge difference.