I tried that out in my field of expertise, to calibrate my expectations. ChatGPT invented multiple references to non-existent but plausibly-titled papers written by me.
I think of that when asking questions about areas I don’t know.
That was about 18mo ago, so maybe this kind of hallucination is under control these days.
LLMs are good for tasks where you can verify the result. And you must verify the result unless you're just using it for entertainment.
[dead]
I would use an agent (Codex) for this task: use the Pro model in ChatGPT for deep research and to assemble the information and citations, then have Codex systematically go through the citations with a task list to web search and verify or correct each. Codex can be used like a test suite.
Turns out Gell-Mann amnesia applies to LLMs too.
LLMann amnesia?