LLMs lack logical constraints in the generative process; they only learn probabilistic constraints. If you apply logic verification post-hoc, you're not "ensuring the correctness of your LLMs reasoning" (I went down this path a year ago); you're classifying whether the LLM's statistically driven pattern generation happens to correspond to correct logic or not, where the LLMs output may be wrong 100% of the time, and your theorem prover simply acts as a classifier, ensuring nothing at all.
Yep, this is a genuine problem, and this is what we term as the autoformalization gap in our follow up paper. (https://arxiv.org/abs/2505.20047)
Some LLMs are more consistent between text and SMT, while others are not. (Tab 1, Fig 14,15)
You can do uncertainty quantification with selective verification to reduce the "risk", for e.g. shown as the Area Under the Risk Coverage Curve in Tab 4.
Well, if you understand that this is a "genuine problem" then what have you done to solve it? A quick look at the abstract of your follow up paper does not reveal an answer.
And let me be clear that this is a major limitation that fundamentally breaks whatever you are trying to achieve. You start with some LLM-generated text that is, by construction, unrelated to any notion of truth or factuality, and you push it through a verifier. Now you are verifying hot air.
It's like research into the efficacy of homeopathic medicine and there's a lot of that indeed, very carefully performed and with great attention to detail. Except all of that research is trying to prove whether doing nothing at all (i.e. homeopathy) has some kind of measurable effect or not. Obviously the answer is not. So what can change that? Only making homeopathy do something instead of nothing. But that's impossible, because homeopathy is, by construction, doing nothing.
It's the same thing with LLMs. Unless you find a way to make an LLM that can generate text that is conditioned on some measure of factuality, then you can verify the output all you like, the whole thing will remain meaningless.
Probabilistic constraints are all around us. You learn that the sine function is the ratio of the length of the side of the right triangle opposite to the angle to the length of the side opposite to the right angle, so obviously the sine is always positive. Yet your thinking should be flexible enough to allow changing the definition to the ordinate of the point on the unit circle where the line corresponding to the given angle and drawn from zero intersects that circle. So your knowledge - the symbolic one - can also be probabilistic.
You're thinking along the right track but without formalization it goes nowhere fast. By layering of differential geometry on top of probability and then maybe category theoretic logic on top of that, each layer constraining the one below it, and all layers cohering, you get somewhere... There is work that's been done in this area, and I was recently interviewed by a journalist who published a high level article on it on Forbes (Why LLMs are failing) and it links to the actual technical work (at first to my high level presentation then Prof. L. Thorne McCarty's work): https://www.forbes.com/sites/hessiejones/2025/09/30/llms-are...
Why is this being down voted? I believe the author acknowledged and responded. Anything wrong?