Hacker News

RLVR can also encourage hallucinations quite easily. Think of SAT: giving a random answer is right 20% of the time, giving "I don't know" is right 0% of the time. If you only reward for test score, you encourage guesswork. So good RL reward design is as important as ever.

That being said, there are methods to train LLMs against hallucinations, and they do improve hallucination-avoidance. But anti-hallucination capabilities are fragile and do not fully generalize. There's no (known) way to train full awareness of its own capabilities into an LLM.

neonspark 3 days ago [ - ]

I think what you say is true, and I think that this is exactly true for humans as well. There is no known way to completely eliminate unintentional bullshit coming from a human’s mouth. We have many techniques for reducing it, including critical thinking, but we are all susceptible to it and I imagine we do it many times a day without too much concern.

We need to make these models much much better, but it’s going to be quite difficult to reduce the levels to even human levels. And the BS will always be there with us. I suppose BS is the natural side effect of any complex system, artificial or biological, that tries to navigate the problem space of reality and speak on it. These systems, sometimes called “minds”, are going to produce things that sound right but just are not true.

ACCount37 3 days ago [ - ]

It's a feeling I can't escape: that by trying to build thinking machines, we glimpse more and more of how the human mind works, and why it works the way it does - imperfections and all.

"Critical thinking" and "scientific method" feel quite similar to the "let's think step by step" prompt for the early LLMs. More elaborate directions, compensating for the more subtle flaws of a more capable mind.