The SPAG paper is an interesting example of true reinforcement learning using language models that improves their performance on a number of hard reasoning benchmarks. https://arxiv.org/abs/2404.10642
The part that is missing from Karpathy's rant is "at scale" (the researchers only ran 3 iterations of the algorithm on small language models) and in "open domains" (I could be wrong about this but IIRC they ran their games on a small number of common english words). But adversarial language games seem promising, at least.
That’s a cool paper - but it seems like it produces better debaters but not better content? To truly use RL’s strengths, it would be a battle of content (model or world representation) not mere token level battles.
I am not sure how that works at the prediction stage as language isn’t the problem here.
I think the hypothesis is that "debating" via the right adversarial word game may naturally select for better reasoning skills. There's some evidence for that in the paper, namely that it (monotonically!) improves the model's performance on seemingly unrelated reasoning stuff like the ARC dataset. Which is mysterious! But yeah, it's much too early to tell, although IIRC the results have been replicated already so that's something.
(by the way, I don't think "debating" is the right term for the SPAG game - it's quite subtle and isn't about arguing for a point, or rhetoric, or anything like that)