That's interesting! I guess the question is how did they detect or simulate a model hallucinating in that regard?
Do you have a link to that article? I can't find anything of that nature with a shallow search.
That's interesting! I guess the question is how did they detect or simulate a model hallucinating in that regard?
Do you have a link to that article? I can't find anything of that nature with a shallow search.
This isn't Anthropic, but here is a python library that focuses on different ways of detecting hallucinations. https://github.com/IINemo/lm-polygraph (caveat emptor, I doubt this really works).