The main problem here is that hallucination suppression doesn’t generalise. We can penalise models for incorrect answers on a wide range of questions, but this doesn’t lead to the emergence of a coherent worldview, which, coupled with logical abilities, is the only true remedy against hallucinations. With current architectures, hallucinations will likely persist on open-domain tasks forever.
> We can penalise models for incorrect answers on a wide range of questions, but this doesn’t lead to the emergence of a coherent worldview, which, coupled with logical abilities, is the only true remedy against hallucinations
I don't think anyone is trying to add "a coherent worldview" by reducing hallucinations, not sure how that even realistically could be aim.
What people want, is for the models to stop giving confident answers that are clearly incorrect. Yes, it won't lead to "a coherent worldview", but it'll at least stop wasting people's time if the model said "You know what, what you said doesn't make sense / isn't clear, is what you mean .... ?" or even "I'm not sure" or "I don't know".
Currently, if you have the wrong starting point, ask the model to do something, they more often than not just go ahead and do that, misunderstandings or not. They seem optimized to never push back, unless you prompt for that, and most seem to favor "I'm just gonna assume X" rather than taking a step back and figuring out how to not assume. Again, unless you prompt against that behaviour/steering it into a different workflow.
Model outputs don't have a confidence score.
I don't think I claimed so either? Or maybe I misunderstand the point you're trying to make.
even if they did it it wouldn't be of much use because correct or not the output was the likely output 100% of the time.