Yup, hallucinations are still a big problem for LLMs.

Nope, there's no reliable solution for them, as of yet.

There's hope that hallucinations will be solved by someone, somehow, soon... but hope is not a strategy.

There's also hype about non-stop progress in AI. Hype is more a strategy... but it can only work for so long.

If no solution materializes soon, many early-adopter LLM projects/trials will be cancelled. Sigh.

trying to "fix hallucinations" is like trying to fix humans being wrong. it's never going to happen. we can maybe iterate towards an asymptote, but we're never going to "fix hallucinations"

No: an LLM that doesn't confabulate will certainly get things wrong in some of the same ways that honest humans do - being misinformed, confusing similar things, "brain" damage from bad programming or hardware errors. But LLM confabulations like the one we're discussing only occur in humans when they're being sociopathically dishonest. A lawyer who makes up a court case is not a "human being wrong," it's a human lying, intentionally trying to deceive. When an LLM does it, it's because it is not capable of understanding that court cases are real events that actually happened.

Cursor's AI agent simply autocompleted a bunch of words that looked like a standard TOU agreement, presumably based on the thousands of such agreements in its training data. It is not actually capable of recognizing that it made a mistake, though I'm sure if you pointed it out directly it would say "you're right, I made a mistake." If a human did this, making up TOU explanations without bothering to check the actual agreement, the explanation would be that they were unbelievably cynical and lazy.

It is very depressing that ChatGPT has been out for nearly three years and we're still having this discussion.

since you've made a throwaway account to say this, I don't expect you to actually read this reply, so I'm not going to put any effort into writing this, but essentially this is a fundamental lack of understanding of humans, brains, and knowledge in general, and ChatGPT being out 3 years is completely irrelevant to that.

not OP but I found that response compelling. I know that humans also confabulate, but it feels intuitively true to me that humans won't unintentionally make something up out of whole cloth with the same level of detail that an llm will hallucinate at. so a human might say "oh yeah there's a library for drawing invisible red lines" but an llm might give you "working" code implementing your impossible task.

I've seen plenty of humans hallucinating many things unintentionally. This does not track. Some people believe there's an entity listening when you kneel and talk to yourself, others will swear for their lives they saw aliens, they got abducted, etc.

Memories are known to be made up by our brains, so even events that we witnessed will be distorted when recalled.

So I agree with GP, that response shows a pretty big lack of understanding on how our brains work.

My startup is working on this fundamental problem.

You can try out our early product here: https://cleanlab.ai/tlm/

(free to try, we'd love to hear your feedback)

Tested the free chat. The chat bot gave slightly incorrect answer, and trustworthiness gave it score of 0.749 and said the answer is completely incorrect, which was not actually the case. Seems more confusing with two answers that are somewhat wrong.

One workaround for using RAG, as mentioned in a podcast I listened to, involves employing a second LLM agent to assess the work of the first LLM. This agent evaluates the response or hallucination by requiring the first LLM to cite sources and subsequently locate those sources.

Every llm output is a hallucination. Sometimes it happens to match reality.