Pretty crazy that H_n - ln(n) has a series expansion with rational coefficients except the constant term

[deleted]

> For small n we can directly implement the definition. For large n, the direct approach would be slow and would accumulate floating point error.

is there a reason the direct definition would be slow, if we cache the prior harmonic number to calculate the next?

I think it's fair to say that summing the series directly would be slow, even if it's not slow when you already happen to have summed the previous n-1 terms.

Not least because for modestly-sized target sums the number of terms you need to sum is more than is actually feasible. For instance, if you're interested in approximating a sum of 100 then you need something on the order of exp(100) or about 10^43 terms. You can't just say "well, it's not slow to add up 10^43 numbers, because it's quick if you've already done the first 10^43-1 of them".

It’s a natural observation, but it doesn’t address the floating point problem. I think the author should have said “fast or would accumulate floating point error” instead of “fast and would accumulate floating point error”.

You could compute in the reverse direction, starting from 1/n instead of starting from 1, this would produce a stable floating point sum but this method is slow.

Edit: Of course, for very large n, 1/n becomes unrepresentable in floating point.

Three techniques I’ve used to handle floating point imprecision/error:

1. Use storage that handles the level of scale and precision you need.

2. Use long/integer (if it fits). This is how some systems store money, e.g. as micros, but even though it’s sensical, there is still a limit and a wild swing of inflation may lead you to migrate to different units, then another wild swing of deflation may have you up-in-arms with data loss. Also it sounds great but could be a pita for storing arbitrary scale and precision.

3. Use ranges when doing comparison to attempt to handle floating point error by fuzzy matching numbers. This isn’t applicable for everything, but I’ve used this before; it worked fine and was much faster than BigDecimal, which mattered at the time. Long integers are really the best for this sort of thing, though; they’re much faster to work with.

4. BigDecimal. The problem with this is memory and speed. Also, as far as we know yet, you couldn’t store pi fully in a BigDecimal, and doing calculations with pi as a BigDecimal would be so slow things would come to a halt; it’s probably the way aliens do encryption.

Interesting follow-up question: What is the distance between the set of harmonic numbers and the integers? i.e. is there a lower bound on the difference between a given integer and its closest harmonic number? If so, for which integer is this achieved?

There’s a trivial lower bound of zero, for n = 1.

For n > 1, there isn’t a lower bound. None of the numbers are integers again (https://en.wikipedia.org/wiki/Harmonic_series_(mathematics)#...), and because the difference between successive partial sums goes to zero and the series grows to arbitrary values, you’re bound to find a difference smaller than 1/(2n) somewhere beyond n.

Spoiler: there is a simple argument against the existence of such a lower bound.

No, because the terms tends monotonically towards zero. Let an integer m with closest harmonic number H_n be given (i.e. n minimizes |H_n-m|). So m exists either between H_n and H_(n+1) or H_n and H_(n-1). Then |H_n-m| < H_(n+1) - H_(n-1) = 1/n + 1/(n+1). We can make that bound arbitrary small by choosing a large enough n.

On this https://www.johndcook.com/blog/special-numbers/ I remember the 'schizofrenic numbers'.