There's a number of little games and a major mod for Skyrim that implements this. I've spent a bit of time playing with the Skyrim mod.
There are some significant issues with it at the moment. One is that you have to train on vast swathes of text to get an LLM, and it's difficult after the fact to remove things after the fact. If you cooperate with the AI and stay "in Skyrim" with what you say to them it works out OK, but if you don't cooperate it becomes clear that Skyrim NPCs know something about Taylor Swift and Fox News, just to name two examples. LLMs in their current form basically can't solve this.
The LLMs are also prone to writing checks the game can't cash. It's neat that the NPCs started talking about a perfectly plausible dungeon adventure they went on in a location that doesn't exist, but "felt" perfectly Skyrim-esque, but there's clearly some non-optimal aspects about that too. And again, this is basically not solvable with LLMs as they are currently constituted.
Really slick experiences with this I think will require a generational change in AI technology. The Mantella mod is fun and all but it would be hard to sell that at scale right now as a gaming experience.
It's solvable it's just a lot of work. You need to have guardrails, i.e. multiple sanity-check passes (maybe there are other techniques, but I found this to be most effective)
One concrete example I'm sure these Skyrim mods aren't using is: enums in structured outputs [1] with a finite list of locations/characters/topics/etc that are allowed to be discussed. The AI is not allowed to respond with anything that is not in the enum. So you can give it a list of all the locations in the game in a huge array and it would be forced to pick one.
[1] https://platform.openai.com/docs/guides/structured-outputs#a...
It seems like a guard model paired with RAG could help here. A guard model could filter out references to current events or anything outside the Skyrim universe, while RAG could be used to ground the NPCs dialogue in actual in game content. That way if the model tries to spin up a dungeon or location, it first checks against the game's data to confirm it actually exists before surfacing it to the player.
Feel like they could probably be used in text based adventures. Doki Doki Literature Club, despite the silly name, is a very unusual game that might really benefit from this.
This is something fine tuning should be able to improve. The main caveats are the same as always: dataset collection, labeling, and training/experimentation time.
Improve, yes, fix, no. I was actually already using a model fine-tuned for Skyrim.
I didn't go into it in detail, but it isn't even that I got the NPCs to start babbling about Taylor Swift. What is was was just that they knew that she was a musician, and as such, might be at the tavern. That's very hard to remove.
ahhh yeah I see what you mean