I was waiting for a post like this to hit the front page of Hacker News any day. Ever since Opus 4.5 and GPT 5.2 came out (mere weeks ago), I've been writing tens of thousands of lines of Lean 4 in a software engineering job and I feel like we are on the eve of a revolution. What used to take me 6 months of work when I was doing my PhD in Coq (now Rocq), now takes from a few hours to a few days. Whole programming languages can get formalized executable semantics in little time. Lean 4 already has a gigantic amount of libraries for math but also for computer science; I expect open source projects to sprout with formalizations of every language, protocol, standard, algorithm you can think of.

Even if you have never written formal proofs but are intrigued by them, try asking a coding agent to do some basic verification. You will not regret it.

Formal proof is not just about proving stuff, it's also about disproving stuff, by finding counterexamples. Once you have stated your property, you can let quickcheck/plausible attack it, possibly helped by a suitable generator which does not have to be random: it can be steered by an LLM as well.

Even further, I'm toying with the idea of including LLMs inside the formalization itself. There is an old and rich idea in the domain of formal proof, that of certificates: rather than proving that the algorithm that produces a result is correct, just compute a checkable certificate with untrusted code and verify it is correct. Checkable certificates can be produced by unverified programs, humans, and now LLMs. Properties, invariants, can all be "guessed" without harm by an LLM and would still have to pass a checker. We have truly entered an age of oracles. It's not halting-problem-oracle territory of course, but it sometimes feels pretty close for practical purposes. LLMs are already better at math than most of us and certainly than me, and so any problem I could plausibly solve on my own, they will do faster without my having to wonder if there is a subtle bug in the proof. I still need to look at the definitions and statements, of course, but my role has changed from finding to checking. Exploring the space of possible solutions is now mostly done better and faster by LLMs. And you can run as many in parallel as you can keep up with, in attention and in time (and money).

If anyone else is as excited about all this as I am, feel free to reach out in comments, I'd love to hear about people's projects !

People are sleeping on the new models being capable of this, 100%. Been telling Opus to make Alloy specs recently and it… just does. Ensuring conformance is rapidly becoming affordable, folks in this thread needed to update their priors!

Do you now use Lean instead of Rocq because your new employer happened to prefer that, or is it superior in your opinion? Which one would you recommend to look at first?

Where do you work that you get to write Lean? That sounds awesome!

I can't disclose that, but what I can say is no one at my company writes Lean yet. I'm basically experimenting with formalizing in Lean stuff I normally do in other languages, and getting results exciting enough I hope to trigger adoption internally. But this is bigger than any single company!

This is perhaps only tangentially related to formal verification, but it made me wonder - what efforts are there, if any, to use LLMs to help with solving some of the tough questions in math and CS (P=NP, etc)? I'd be curious to know how a mathematician would approach that.

So as for math of that level, (the best) humans are still kings by far. But things are moving quickly and there is very exciting human-machine collaboration, one need only look at recent interviews of Terence Tao!

I agree. I think we've gotta get through the rough couple of "AI slop" years of code and we'll come out of it the other side with some incredible tools.

The reason we don't all write code to the level that can operate the Space Shuttle is because we don't have the resources and the projects most of us work on all allow some wiggle room for bugs since lives generally aren't at risk. But we'd all love to check in code that was verifiably bug-free, exploit-free, secure etc if we could get that at a low, low price.

at some level it's not really an engineering issue. "bug free" requires that there is some external known goal with sufficient fidelity that it can classify all behaviors as "bug" or "not bug". This really doesn't exist in the vast majority of software projects. It is of course occasionally true that programmers are writing code that explicitly doesn't meet one of the requirements they were given, but most of the time the issue is that nothing was specified for certain cases, so code does whatever was easiest to implement. It is only when encountering those unspecified cases (either via a user report, or product demo, or manual QA) that the behavior is classified as "bug" or "not bug".

I don't see how AI would help with that even if it made writing code completely free. Even if the AI is writing the spec and fully specifies all possible outcomes, the human reviewing it will glance over the spec and approve it only to change their mind when confrunted with the actual behavior or user reports.

> I don't see how AI would help with that

What if the AI kept bringing up unspecified cases and all you (the human) had to do all day was respond to it on what the behavior should be in each case? In this model the AI would not specify the outcomes; the specification is whatever you initially specified, and your responses to the AI's questions about the outcomes. At some point you'd decide that you'd answered enough questions (or the AI could not come up with any more unspecified cases), and bugs would be in what remained, but it would still mean substantially more thinking about cases than now.

This sounds amazing! What kind of systems take you a few hours to a few days now? Just curious whether it works in a niche (like sequential code), or does it work for concurrent and distributed systems as well?

> I've been writing tens of thousands of lines of Lean 4 in a software engineering job

I am wondering what exactly you are doing? What tasks you are solving using generated lean?

Having known nothing of this field before now I have to say your excitement has me excited!