Twice in the text quotes Claude Code's F1 score as 32%, but the table shows the score is 37%. It's very likely that the actual score is 32% (because it is referenced 2 times, and a third time indirectly as the difference 'seven').
Oddly, this is a strong indication of the text being hand-written rather than LLM-assisted; it's very likely that a human made a mistake in creating the table.
> ... beating Claude Code (32%) ...
> ... GLM 5.2 ... beat Claude Code by seven points (39% vs. 32%).
> Rank | Configuration | Harness | F1
> ...
> 4 | Claude Code (Opus 4.6) | Claude Code SDK | 37%
Hello author here, or one of them anyway. I can confirm that it was hand written, 32% was combined all the Claude models (4.6, 4.7, 4.8) mushed into one score, 37% was Opus 4.6 specifically (which did the best)