> In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.

75% win rate seems pretty good!

Paper link: https://law.stanford.edu/wp-content/uploads/2026/06/salinas_...

I wonder to what degree the AI was just better at communicating. My experience with attorneys is that they are often some of the worst writers.

The writing is always fluid and grammatically flawless. This carries much more weight with us than we believe. I know the illusion well from decades of grading college papers. Many of the highest quality students use English as a second language, and I know this, but an American well trained in writing, grammar, spelling always gives an impression of superiority. (Being well trained in writing, grammar, spelling etc is of course high merit, which is how the illusion forms - it is basically an illusion of global 'intelligence')

Yeah, 75% win rate is a ~200 points Elo difference, which is quite massive.

I do wish they'd used some more objective criteria. Simply being preferable one of the things LLMs have trained for since the beginning, hence its sycophantic nature.

Maybe sycophantic nature is a good fit for the legal system. A successful lawyer once told me that the most important thing is to know your judge. Objectivity isn't a big thing in court. They'll cite random newspaper articles as evidence and throw out expert opinions - if they like. There might be a way to appeal - but that road often is not functional.

What criteria would you use for judging legal arguments?

The arguments need to be based on actual law, and any cited reference cases need to be real.

There's been a lot of news stories about lawyers using AI, and then getting in trouble for citing hallucinated laws or cases. It doesn't matter if the AI response is "preferred" over the human one if it gets thrown out when put under the scrutiny of a real case.

Who's gonna determine that? A bunch of law professors?

But did they? Or did they just go off what answer felt better? Did they put in any work to actually confirm the answer? Or did the busy law professors just click through and move on with their life?

maybe seeing if the case law it cited was real or imagined? Just one idea, IANAL

Well, they had the data around if the answer would be harmful to the students learning. AI was scored at 3.5% harmful answers and 12% of law professor answers were considered harmful.