The final conclusion though stands without any justification - that LLM + RL will somehow out-perform people at open-domain problem solving seems quite a jump to me.

I think the point is that it's practically impossible to correctly perform RLHF in open domains, so comparisons simply can't happen.

To be fair, it says "has a real shot at" and AlphaGo level. AlphaGo clearly beat humans on Go, so thinking that if you could replicate that, it would have a shot doesn't seem crazy to me

That only makes sense if you think Go is as expressive as written language.

And here I mean that it the act of making a single (plausible) move that must match the expressiveness of language, because otherwise you're not in the domain of Go but the far less interesting "I have a 19x19 pixel grid and two colours".

AlphaGo has got nothing to do with LLMs though. It's a combination of RL + MCTS. I'm not sure where you are seeing any relevance! DeepMind also used RL for playing video games - so what?!