Hacker News

Sure, what the program does is not interesting by itself, neither is that you can use AI to create programs to do polygon intersection.

The main feature, that I hope is interesting in this submission, is that the program is formally verified and how I used formal verification together with AI to create it.

Formal verification means that a mathematical proof is provided that the program satisfies a specification. And that proof is checked automatically by a deterministic system, the lean checker, which we can trust, in constrast to error prone LLMs.

I gave the agent a formal specification m1.interior ∩ m2.interior = result.interior and it produced an implementation together with such a formal proof. With this approach we can treat much of the work of the agent as a black box, which we don't have to review to judge correctness.

I think the project shows that as AI agents get more capable, an approach like this is starting to get practical for certain problems like polygon intersection for which there is a concise way to specify the problem.

Yes, it runs significantly longer.

Opus 4.8 ran autonomously for about 8 hours to provide program and proof of correctness, given the formal specification. And in previous experiments, Opus 4.7 failed and I was only able to do it using that model when I cut the work for the agents in smaller steps.

Much of the work the agent is to provide the proof of correctness. The upside is that less time is needed for human review and we can guarantee the absence of bugs that might be expensive when they come up in production.

Historically formal verification was only worth it for very critical software. In the readme I reference related work from NASA, that implemented and verified a different algorithm concerned with polygons, with the intended application to compute keep out zones for autonomous vehicles. This was 2021 before capable LLMs and in the paper they mention that they manually wrote 700 lemmas to produce such a formal proof manually. I hope that as it gets cheaper now, formal verification is used more widely.