Inject some adversarial priming as is in actual usage, and you can probably get that number to >=95%

Our experience with Lenz is that forcing a multi-step process, incl. adversarial debates, helps improve the verdicts.