I wonder how formal methods can help us move faster with GenAI.

Is it that they can help write formulas faster? That they can help ensure formulas match the system they're modeling faster? If the problem you think formal methods will help with is sloppy code, isn't the verification code going to be sloppy as well, unless some (not sloppy) intelligence carefully confirms that the specification matches the target system, which was the labor that previously made formal methods too expensive? I guess I don't understand how the argument works if code was previously less sloppy and verification was too expensive, and now code is more sloppy, and there's more of it, but somehow the sloppy intelligence will make verification move fast enough to make it worthwhile.

Unless we have some non-sloppy intelligence that's less of a bottleneck on verification than humans, how are we in a better place?

Maybe it's that investing that huge amount of labor of verification by human experts is now worth it because so much code will be produced that uses the verification systems that the investment will now pay off. But that requires creating pretty general verification systems, such as type system verification or something (which is what they seem to be aimed at), rather than individually verifying software systems like the micro-kernel example.

In other words, maybe the play is to invest in reusable verification systems that can be run tons of times on new code and systems. If so, it's surprising that this wasn't always the strategy.

> isn't the verification code going to be sloppy as well

The beauty of formal methods is it doesn't matter if your proof is sloppy. As long as it passes verification, it is correct. And unlike in pure math, the proof that a software system is correct is usually a huge mess of special cases, loop invariants, proofs by induction, and boilerplate that requires a large amount of human labour while providing no insight.

Proofs are also brittle: a tiny change in the code can force you to throw your proof away and start from scratch.

To me, the exciting thing about formal methods in the LLM era is it allows humans to offload the difficult and tedious work of writing proofs to a computer. Taken to an extreme, the human could live entirely in the world of a formal specification, and the LLM could generate 100% of the code. The code may be a mess, but if the system proves it satisfies the spec then it can't be wrong.

So, formal methods produce runnable systems, but communication remains the challenge.

If a formal spec is messy, then it's a proof of ... what, exactly?

A formal specification that bridges tech and product, that lets non-technical contributors read and discuss all the logical nuances, directly as operational code, at product's level of abstraction of interest, would transform a lot.

It's no longer a challenge to create code, it's a challenge to create business requirements and translate them into systems.

The spec and proof are separate. In this blog article he mentions seL4 formal verification, where they state that the spec was 4900 lines of Isabelle and the proof was 200K lines. Obviously human has to understand the spec deeply.

There's an information theoretic aspect about generating a proof which is essentially not human readable from 4900 lines of spec. I wonder how much additional signal they're getting out beyond what's in that 4900 lines, and what's the percentage of noise in the 200k lines of proof?

The problem is that generating either code or proofs with LLMs is very expensive, and generating good proofs (I don't mean elegant, I mean proving the most important properties) is probably not very fast, either. Reducing the verification time of a program from 100 years to 10 years or the cost from $1bn to $100m is still not practical enough to become truly mainstream.

Things can be improved when people help guide and focus the LLMs, but these people still need to be formal methods experts.

The general idea is that formal methods are self-verifying, up to a point. Sloppy formal methods simply won't prove.

The point up to which formal methods stops is: do the formally encoded requirements actually encode what I want to be true?

One can make the argument that the requirements is a much smaller surface to verify than that of the entire program.

The counter argument is that figuring out what you want the program to do has always been the hardest part of programming, and that programming in itself is a journey to discover latent requirements.

> One can make the argument that the requirements is a much smaller surface to verify than that of the entire program.

This argument is unfortunately empirically false for any program of any meaningful complexity (>1000 lines, probably even as low as >100 lines ignoring well-defined algorithms and data structures) using current formal methods.

Complete formal specifications are usually multiple times larger than the corresponding source code and encode esoteric propertys necessary for the proof, but which are largely even more impenetrable than a undocumented codebase.

So, it is both harder to figure out if you encoded the desired requirements and it is more complex. Your only advantage is confidence that what you wrote down is proven.

> Complete formal specifications are usually multiple times larger than the corresponding source code and encode esoteric propertys necessary for the proof, but which are largely even more impenetrable than a undocumented codebase.

Is it possible that the spec could be factorised into something higher-level and more modular? If not, can you give a flavour of the type of unavoidable esoteric detail? (One area I can see lots of complications is when dealing with versioned data, especially in multiple interacting systems -- naively, correctness needs to be proven for every combination of versions, even if some are never seen.)

> I wonder how formal methods can help us move faster with GenAI.

Perhaps better the other way around? How GenAI can help us move faster with formal methods.

Rather than improve the quality of new software, there's a mountain of existing codebases that could benefit from this.