I can't help it, the first thought that came to mind was "Huh...talk about sheer senseless brute force." Why use a Large Language Model on something as clearly defined in scope as a game instead of a model designed and trained for the task/ruleset? Sure, there's the argument of not having to train that model, but OTOH, "decent harnesses" does some very heavy lifting there...

I think it's a compelling argument. You would need a large dataset of completed games on which to train, which may have something to do with why the games considered solved by AI are also among those where exist a very rich and heavily annotated corpus of completed games in algebraic notation.

Of course - but in practice you won't be aiming towards fully a "solved" game or that kind of player skill for something like Civ - and even so, I severely doubt an LLM realistically can hope to even get in the vicinity unless the aforementioned "harness" does something similar anywayas part of its heavy lifting I mentioned.