Hello author here, or one of them anyway. I can confirm that it was hand written, 32% was combined all the Claude models (4.6, 4.7, 4.8) mushed into one score, 37% was Opus 4.6 specifically (which did the best)