I think you're conflating mechanism with function/capability.
I'm not sure what I wrote that made you conclude that I thought these models are not learning anything from their RL training?! Let me say it again: they are learning to steer towards reasoning steps that during training led to rewards.
The capabilities of LLMs, both with and without RL, are a bit counter-intuitive, and I think that, at least in part, comes down to the massive size of the training sets and the even more massive number of novel combinations of learnt patterns they can therefore potentially generate...
In a way it's surprising how FEW new mathematical results they've been coaxed into generating, given that they've probably encountered a huge portion of mankind's mathematical knowledge, and can potentially recombine all of these pieces in at least somewhat arbitrary ways. You might have thought that there are results A, B and C hiding away in some obscure mathematical papers that no human has previously considered to put together before (just because of the vast number of such potential combinations), that might lead to some interesting result.
If you are unsure yourself about whether LLMs are sufficient to reach AGI (meaning full human-level intelligence), then why not listen to someone like Demis Hassabis, one of the brightest and best placed people in the field to have considered this, who says the answer is "no", and that a number of major new "transformer-level" discoveries/inventions will be needed to get there.