I would say local LLMs are already good enough for this. It might be an issue that the GPU is already quite busy at the time it's needed for LLM inference, but not an insurmountable issue.
I wouldn't ever want a game to use it for the core story writing, because it's pretty important that it is consistent and unable to be derailed. But for less serious NPC interactions or like an RPG scenario it is such a great fit.
I also wouldn't want a single player game to rely on remote inference, because that will get turned off eventually and then your game doesn't work.
I think generally, games should move to using the GPU less for graphics and more for computation. Not just AI computation - those fancy GPUs are a big resource that simulation games could be taking advantage of and just... aren't.
(Yes, this is a Paradox callout. Give me less fancy particle effects in Vic3 and use the GPU for computing pop updates faster!)
(Probably the biggest barrier to this is the lack of a convenient C++/C#-level cross-manufacturer compute API. Vulkan is a bit too low-level for game devs to work with, OpenCL kind of sucks, and CUDA is NVIDIA-only.)
For story writing you generate into the framework of https://news.ycombinator.com/item?id=45134144