Well, if you understand that this is a "genuine problem" then what have you done to solve it? A quick look at the abstract of your follow up paper does not reveal an answer.

And let me be clear that this is a major limitation that fundamentally breaks whatever you are trying to achieve. You start with some LLM-generated text that is, by construction, unrelated to any notion of truth or factuality, and you push it through a verifier. Now you are verifying hot air.

It's like research into the efficacy of homeopathic medicine and there's a lot of that indeed, very carefully performed and with great attention to detail. Except all of that research is trying to prove whether doing nothing at all (i.e. homeopathy) has some kind of measurable effect or not. Obviously the answer is not. So what can change that? Only making homeopathy do something instead of nothing. But that's impossible, because homeopathy is, by construction, doing nothing.

It's the same thing with LLMs. Unless you find a way to make an LLM that can generate text that is conditioned on some measure of factuality, then you can verify the output all you like, the whole thing will remain meaningless.