Since LLMs are great at coding but bad at logic, maybe an approach like this can bridge the gap? So first let it translate natural language to a formal language, from there allow it to use a logic engine to make verifiable transformations (correctness-preserving), and finally translate back to natural language.

People are already using Prolog for this;

1) A series of excellent and detailed blog posts by Eugene Asahara Prolog in the LLM Era - https://eugeneasahara.com/category/prolog-in-the-llm-era/

2) Previous HN discussion Use Prolog to improve LLM's reasoning - https://news.ycombinator.com/item?id=41831735

3) User "bytebach" gives a nice example of using Prolog as an intermediate DSL in the prompt to an LLM so as to transform English declarative -> Imperative code - https://news.ycombinator.com/item?id=41549823

There's also [1], containing further bibliography references along with practical applications in discrete planning.

Prolog is quite popular and successful as a target for LLMs. And it's no accident considering Prolog was introduced to represent natural language statements in (predicate) logic.

[1]: https://quantumprolog.sgml.net/llm-demo/part1.html

That is pretty neat!

As big Prolog fan, thanks for sharing those resources.

> So first let it translate natural language to a formal language, from there allow it to use a logic engine to make verifiable transformations (correctness-preserving), and finally translate back to natural language.

Linguists in the Richard Montague tradition have indeed attempted to use tools like formal logic, lambda calculus, continuations, monads, modalities etc. to try and understand the semantics of natural language in a way that's both logical/formal and compositional - i.e. accounting at least partially for the "deep" syntax of natural language itself, such that a fragment can be said to have a semantics of its own and the global semantics of a broader construction arises from "composing" these narrower semantics in a reasonably straightforward way.

This is pretty much the same as trying to take the "let's translate natural language into formal logic" proof-of-concept exercises from a text like OP (or from your average logic textbook) seriously and extending them to natural language as a whole. It turns out that this is really, really hard, because natural language mixes multiple "modalities" together in what looks like a very ad-hoc way. We only barely have the tools in formal logic to try and replicate this, such as continuations, modalities and monads. (Linguists actually talk about many phenomena of this kind, talking about "modalities" is just one example that's both general enough to give a broad idea and happens to be straightforward enough on the logical side. You have quantification, intensionality, anaphora, scope, presupposition, modality proper, discourse-level inference, pragmatics, ellipsis, indexicals, speech acts, etc. etc. etc.)

And because the semantics of natural language is both so general and so hard to pin down, it doesn't seem useful to "reason" about the logical semantics of natural languages so directly. You can of course use logical/mathematical modeling to address all sorts of problems, but this doesn't occur via a verbatim "translation" from some specific language utterance.

I've been strapping different LLM based setups to Lean 4 with a variety of different prompting methods. My biggest conclusion here is that LLMs are worse at formalizing than humans are. Additionally, for Lean 4 specifically, I don't think there's enough training data.

I'm of the opinion that formalization is the biggest bottleneck of current generation LLMs. However, I don't think that this necessarily suggests that LLMs don't benefit from formal methods. Given existing abstractions, Lean4's exceptional tooling allows for more efficient iteration with LLMs and requires less human supervision since Lean's language server provides specific and actionable feedback whenever the LLM makes a mistake. I've also noticed that LLMs can reason about code written in Lean4 far more effectively than in Python, despite Python having orders of magnitude more training data than Lean.

Nonetheless, I concur that LLMs don't yet know how to translate a request stated in a prompt to a complete Lean4 interpretation. My practice so far has usually required me to first choose an existing reference file that is similar to my desired goals, and use this reference as "inspiration" for how the LLM should go about formalization.

Yeah, we really need LLMs to work swimmingly with Lean 4. It is currently hot garbage and it does not understand proof composition, exploring proof extensions, lemma search, etc. It does not explore an open-ended node to a mathematical knowledge graph by substituting various options.

I'd happily work with someone on a conversational theorem prover, if anyone's up for it.

Join the Lean Zulip. There are many people interested in this.

https://leanprover.zulipchat.com/

I think a big issue with this approach is that the initial and last steps are prone to sycophancy: the machine wants you to believe it's getting the job done, which may lead it to do something correct-looking over something correct. The middle steps (correct-by-construction transformations) do not need an LLM at all. It's what a certified compiler does.

I think the way forward, for the immediate future, is to feed AI agents with a mixture of (hand-written) natural language and formal blueprints, then use as many mechanized analysis methods as possible on the generated code (from unit/regression testing to static analysis, and possibly more powerful software verification procedures). Potentially feed the output of these analyses back to the agents.

That's the approach we're taking to verify LLM-generated SQL code at http://sql.ai.

Essentially there is growing interest in the "formal" math community (combinatorics, mining, etc ..) to do exactly this.