Hacker News

The link was to show Coq's floating point library.

> As a concrete example, how would you prove an implementation of a Kalman filter is correct?

This would be fun enough to implement if I had more time. But I need to spend time with my family. I'm usually loathe to just post LLM output, but in this case, since you're looking just for the gist, and I don't have the time to write it out in detail, here's some LLM output: https://pastebin.com/0ZU51vN0

Based on a quick skim, I can vouch for the overall approach. There's some minor errors/things I would do differently, and I'm extremely doubtful it typechecks (I didn't use a coding agent, just a chat window), but it captures the overall idea and gives some of the concrete syntax. I would also probably have a third implementation of the 1-d Kalman filter and would prefer some more theorems around the correctness of the Kalman filter itself (in addition to numerical stability). And of course a lot of the theorems are just left as proof stubs at the moment (but those could be filled in given enough time).

But it's enough to demonstrate the overall outline of what it would look like.

The overall approach I would have with floating point algorithms is exemplified by the following pseudo-code for a simple single argument function. We will first need some floating point machinery that establishes the relationship between floating point numbers and real numbers.

  # Calculates the corresponding real number from the sign, mantissa, and exponent
  floatToR : Float -> R
  
  # Logical predicate that holds if and only if x can be represented exactly in floating point
  IsFloat(x: R)

Then define a set of three functions, `f0` which is on the idealized reals, `f1` which is on floating point numbers as mechanically defined via binary mantissa, exponent, and sign, and `f2` which is on the finite subset of the reals which have an exact floating point representation.

  f0 : R -> R
  f1 : Float -> Float
  f2 : (x : R) -> IsFloat(x) -> R

The idea here is that `f0` is our "ideal" algorithm. `f1` is the algorithm implemented with exact adherence to low-level floating point operations. And `f2` is the ultimate algorithm we'll extract out to runnable code (because it might be extremely tedious, messy, and horrendous for runtime performance, to directly tie `f1`'s explicit representation of mantissa, exponent, and sign, which is likely some record structure with various pointers, to the machine 32-bit float).

Then I prove the exact correspondence between `f1` and `f2`.

  f1_equal_to_f2 : forall (x: R), (isFloatProof: IsFloat(x)) -> f1(floatToR(x)) = floatToR(f2(x, isFloatProof))

This gives the following corollary FWIW.

  f2_always_returns_floats : forall (x: R), IsFloat(x) -> IsFloat(f2(x))

This lets me throw away `f1` and work directly with `f2`, which lets me more easily relate `f2` and `f0` since they both work on reals and I don't need to constantly convert between `Float` and `R`.

So then I prove numerical stability on `f2` relative to `f0`. I start with a Wilkinson style backwards error analysis.

  f2_is_backwards_stable : forall (x: R), exists (y: R), IsFloat(x) -> f2(x) = f0(y) and IsCloseTo(y, x)

Then, if applicable, I would prove a forward error analysis (which requires bounding the input and then showing that overflow doesn't happen and optionally underflow doesn't happen)

  f2_is_forwards_stable : forall (x: R), IsFloat(x) -> IsReasonablyBounded(x) -> IsCloseTo(f2(x), f0(x))

And then given both of those I might then go on to prove some conditioning properties.

Then finally I extract out `f2` to runnable code (and make sure that compilation process after that has all the right floating point flags, e.g. taking into consideration FMA or just outright disabling FMA if we didn't include it in our verification, making sure various fast math modes are turned off, etc.).