> The biggest trap is the hallucinated citation. It will easily insert an absolutely authentic sounding quotation from another case that perfectly proves the point you are trying to make, then it'll make up an authentic name for it, e.g. United States v. Shenzhou Electronics Inc or whatever.
Naive question from an outsider: aren't there searchable databases of cases (with complete text) so that citations could be checked automatically, either by the same or an independent agent?
So, all of these cases are public records. The federal level stuff is all available quite openly on the web. The state stuff is a mixed nightmare of fifty different systems at the appellate level (which is the stuff that is usually cited). At trial court level you have (literally) 3000 different systems, most of which are not accessible for LLMs.
But yes, 100% LLMs should be able to check themselves. Another poster below brought up the other issue is that you can check the citation and it's 100% correct, but that it doesn't legally apply to what you are writing, and/or it doesn't mean what the LLM thinks it means in the limited context it has taken it from.
It depends on the jurisdiction. I'm based in France and all cases here are now freely available online to people and agents [1], but it's very recent for lower courts. However, I recently had to work on Texas case law and we had to purchase access to a (very expensive [2]) database since most of it wasn't public.
[1] https://www.legifrance.gouv.fr/
[2] https://legal.thomsonreuters.com/en/westlaw/plans-and-pricin...
US in a nutshell
It’s a band aid solution because the model can get stuck in a refutation loop, where it argues a point by pulling up a contradicting source ad infinitum. The holy grail, which has not been yet reached, is figuring out how to dynamically align the model to be consistent with all the sources in the first place (and this is a problem of provenance rather than model design)
I’ve been doing ai legal research via caselaw api with Claude code for at least a year and I’ve never seen that happen.