Hacker News

I think the hypothesis is that "debating" via the right adversarial word game may naturally select for better reasoning skills. There's some evidence for that in the paper, namely that it (monotonically!) improves the model's performance on seemingly unrelated reasoning stuff like the ARC dataset. Which is mysterious! But yeah, it's much too early to tell, although IIRC the results have been replicated already so that's something.

(by the way, I don't think "debating" is the right term for the SPAG game - it's quite subtle and isn't about arguing for a point, or rhetoric, or anything like that)