The thing I'm stuck on is that, yes, the proof is correct.
But what guarantees do I have that the proof or theorem is actually representative of the program I'm trying to specify?
And beyond that, what guarantees are there that the resulting end program actually adheres to either the specs or what has been proven?
Like it seems the goal here is `English -> Lean -> Code in whatever language` instead of just `English -> Code` but there's no way to be certain the translation step on either side of Lean has actually been done correctly.
I've done a lot of experiments around this so far this year and theorem provers can certainly help provide LLMs with more context but there is no tooling to actually verify that the resulting Python fully covers the corresponding Lean that was fed in.
It's often said that formal verification moves bugs from the implementation to the specification, and there's also the problem of proving equality between the formally proven specification and the implementation.
The first problem is just hard, the second can be approached by using a language like F* or Dafny that compiles to executable code. IIRC Amazon has had some success by using Lean for the specification and using fuzzing to test that the implementation matches.
You're using an LLM to translate Lean to Python? Care to tell us more?
If you want to prove stuff about programs then Lean is a very capable tool. But it's intended that you write the program itself in Lean and not in some other language.
I guess if you ask an LLM nicely it will attempt to translate a program that has already been written in Lean to one in Python, but why would you want to do that? I doubt it will run any faster.