Hacker News

Calypso: LLMs as Dungeon Masters' Assistants [pdf]

85 points by azhenley a day ago | 28 comments

I think that technology is really the last thing I want in a dungeons and dragons game. I prefer only pen and paper to be in a pen and paper game. And while it is an obvious problem to be solved by LLMs, an LLM will never recreate the fun of using a central casting book to create a backstory [1].

[1] https://open.substack.com/pub/worldofkamenlandia/p/the-world...

vunderba a day ago [ - ]

So story time.

I've been hosting a DnD campaign with a group of college friends for almost a decade at this point. Since we've all moved away, we use Tabletop Simulator to play weekly.

In one of my adventures, the players met a creature named Dorian with the same backstory as the classic tale The Picture of Dorian Gray. Later, the players discovered the secret to the creature's immortality was an indestructible painting of the creature.

Nearly a month before encountering this creature, my players had explored a random dungeon I’d made where one of the rooms had a huge, ornate mirror on the wall. It made everything in the reflection appear older, covered in a layer of cobwebs and dust.

Although this was not the primary solution to defeating Dorian, I laid subtle clues to see if anyone would remember that far back. Well, one of them did in fact remember the dungeon. They stole the painting and brought it back to the dungeon along with a secondary mirror. They then placed the mirrors across from each other along with the painting to create a sort of “hall of mirrors” effect on the painting itself, which caused the painting of Dorian to accelerate in age—creating a feedback loop that infinitely aged Dorian (think drinking from the wrong Holy Grail).

It was amazingly satisfying to all my players, and it's also the exact sort of thing that I don't think the current batch of SOTA LLMs (even with the temperature cranked to 1) would ever think of in a MILLION years.

roenxi a day ago [ - ]

> It was amazingly satisfying to all my players, and it's also the exact sort of thing that I don't think the current batch of SOTA LLMs (even with the temperature cranked to 1) would ever think of in a MILLION years.

That actually seems like it'd be a pretty easy one for an LLM to get to. I tried

    You are a highly intelligent and creative D&D campaign planner. Imagine a creature that is immortal with mechanics like The Picture of Dorian Gray - its soul is protected in a painting and that gives its body immortality. Please brainstorm some interesting ways to defeat this antagonist that might connect to other aspects of a campaign, suggesting new ideas.

and on the 2nd run on a small local LLM got "*Reflections and Mirrors*: The creature's soul is reflected in mirrors and reflective surfaces, making them vulnerable to attacks. The party must use mirrors to their advantage, luring the creature into a trap or using them to distract it while they attack." and "* *The Corruption of Beauty*: The immortal creature is a symbol of beauty and corruption. To defeat it, the party must find a way to strip it of its beauty and corruption, using it against itself. This could involve using a magical artifact that amplifies the creature's own corruption, or finding a way to expose its true nature and render it powerless.".

It seems quite reasonable that a large modern LLM could come up with exactly that idea in far less than a million years. Art, mirrors and defacing art are all pretty standard sounding themes for handling that sort of monster.

vunderba a day ago [ - ]

I disagree - a generic suggestion to use a "reflection to defeat Dorian" isn't at all the same thing as standing up two mirrors to create a hall of mirror effect on a magic mirror that "ages" the world around it. The key difference is in the level of specificity.

Furthermore, the true test of the LLM would have been to devise both deciding to use a "Dorian Gray" type creature along with an appropriate mechanism for its demise.

IMHO, you've guided it by posing that this monster exists within the world. I'd like to see an entire campaign devised by an LLM - no guidance necessary. I don't know what small model you are using (mistral, vicuna, etc) but try asking it to create a list of monster based encounter/puzzles.

This is why I think the current batch of LLMs aren't really capable of being more than an assistant at best for writing. Interestingly I actually think AI dungeon released back in 2019 was far more capable in this regard and IIRC used a significantly older model - GPT 2.0.

roenxi a day ago [ - ]

I added "Be really specific" to the prompt and on the second go got:

11. *The Art of Decay*: The Portrait's soul is tied to the painting, but what if the players could use the painting's power to accelerate the decay of The Portrait's physical form? This could involve using a powerful magical effect or finding a way to imbue the painting with a magical property that would accelerate the decay process.

in the brainstorm. I don't think it'd take a million years; it's got the concepts and it is trying to put them together with only 5 attempts so far; and that is only 2Gb of weights on a relatively cheap GPU.

As for coming up with an entire campaign that copys an idea from Wilde, well it depends and I'd agree that is a task that would want a human in the loop. It does seem difficult to do in one prompt - but I think a million years is wildly overestimating the novelty of the campaign you're describing. It is actually damn hard to come up with good, novel ideas and the odds are the LLMs have literally seen variants of the Dorian Grey idea 100 times.

vunderba a day ago [ - ]

It's just describing the fact that the players need to find a way to age the portrait. At the risk of being a bit glib, I mean, duh. And its solution is the rather pedestrian "powerful magical effect or finding a magical way" jazz hands pile of claptrap.

I will admit that a million years was a bit of hyperbole.

bee_rider a day ago [ - ]

I’m only barely familiar with the actual underlying story—is aging the portrait actually obvious? I thought the solution was to destroy it. Isn’t the whole point of the figure that it ages (instead of Dorian)?

Eisenstein a day ago [ - ]

The fact is that a small llm can come up with scenarios that would be kind of like it, and you are dismissing that because you want humans to be the only things capable of originality. In fact you lifted an entire aspect of the campaign from a classic novel, and the mirrors being magic concept is pretty much ingrained in cultural tropes, so I'm not sure why you think that system which has as its basis the entirety of human written output which was designed to predict more output couldn't come up with it.

lukan a day ago [ - ]

" and you are dismissing that because you want humans to be the only things capable of originality."

I have not seen convincing cases of LLM's being capable of orginality. But they do have a lot of originality in their training data.

jl6 a day ago [ - ]

But then, human artists have a lot of training and influences too, so it's not so easy for humans to be truly original either (if that is a thing).

lukan a day ago [ - ]

And then, if you are too original, no one will like it either ..

I don't think there is something "truly original" but in this concrete case, I suspect something very similar was in the training data.

vunderba a day ago [ - ]

I'm not dismissing LLMs, but there's a reason that the current crop of LLMs aren't writing hit scripts for TV shows, books, etc. The suggestion to use a "magical effect" is the equivalent of asking an LLM, "How can I break into this password protected computer?", and it responds, "You should use an exploit.".

ocimbote a day ago [ - ]

I would disagree, but on,y considering I think this is not the right prompt to test against. Hence not the right question.

While you're asking to find creative ways to get rid of the player, I think what LLMs are unable to do (at this point?) is to come up with the idea of an aging mirror, let it sit for a while and only then get back to it when they met its attached character.

The dungeon master did not follow a track of events but rather picked interesting somewhat random contents and moments from the campaign and picked them up to create new story lines.

That doesn't seem like something an LLM would do easily.

FloorEgg 20 hours ago [ - ]

Its not something an LLM would do with simple chatGPT type prompts, but it's something I can imagine building an agentic system to do. It's not trivial but seems feasible with current day LLMs.

If you design the system to have this exact quality (among many others), where clues are dropped earlier in the quest line for later quests. It's a matter of breaking up the prompts and iteratively refining the outputs.

derektank a day ago [ - ]

I would really like to see what a reasoning model with access to player character information, resources available in the location, and the monster manual could do. One of the hardest things as a DM, in my experience, is creating a balanced encounter without fudging. This has always made it hard for me to justify presenting a truly deadly encounter which I feel has lowered the stakes of the game. It seems like it should be possible to create a system that knows the strengths/weaknesses of a party and that could create a challenging but not overwhelming encounter most of the time.

Cthulhu_ a day ago [ - ]

I'm confident the first games trying to have an AI as game master are being built, if not demonstrated already.

And I'm also confident that they will not make for good games because the players will find and exploit loopholes. I think LLMs in (video) games are here and here to stay, but they will only be actually beneficial to a game if they are tightly restricted and limited in scope. (no having your ingame partner write your code for you)

theshrike79 a day ago [ - ]

> One of the hardest things as a DM, in my experience, is creating a balanced encounter without fudging

Only in D&D where you have so many variables in every direction it's practically impossible.

An easy encounter becomes hard if the players decide to be miserly with their abilities and items.

And a hard one will be easy if someone decides to go all out and use That Item to one-shot the encounter.

rappatic a day ago [ - ]

> One of the hardest things as a DM, in my experience, is creating a balanced encounter without fudging

Just fudge. I know it might seem dishonest, but I think I fudge at least a little bit in probably 80% of my encounters. The most important thing isn't accuracy from the DM's perspective, it's fun and accuracy from the players' perspective. Unless the players catch wind of the fudging, it literally only has upsides:

- You waste less prep time fine-tuning stat blocks and can spend more time on the interesting and material aspects of prep, like designing NPCs, dungeons, etc.

- It makes the combat encounters more interesting, because encounters that would be super one-sided in either direction are instead close and nail-biting

- It allows you to end a boring encounter quicker or prolong unexpectedly interesting encounters

- You can create cool moments where a monster fails an attack at a crucial moment or succeeds when the odds are stacked against them

...etc. I think well-executed fudging is a complete win-win situation, as long as your players don't find out.

vunderba a day ago [ - ]

Agreed. There's a reason a DM rolls behind a screen.

Additionally I always setup encounters with a possible exploit/vulnerability that keenly observant players might notice which will significantly reduce the difficulty of the encounter. If they fail to figure it out, the encounter is far more challenging but still well within the player's capabilities.

ajuc a day ago [ - ]

Balance is hard mostly because the possible results are too binary. Either you win or you're dead. Stakes are so high that you tend to err on the safe but boring or you risk TPK.

Not only is this boring (it gives players less opportunities for making high-stakes decisions during the fight) - it also makes it harder on GMs to balance encounters.

If you let players and enemies decide to flee or negotiate at any point - the encounters can be much more deadly without turning into TPKs. Players now have more decisions to make on each round (and they can debate during the fight if they should run or not - which is a potential for great roleplaying moment).

And when they decide to run - a successful retreat is an interesting tactical challenge by itself allowing players to use abilities and combos that they seldom need during traditional fight.

One of my favorite moment of roleplay was when our barbarian was arguing with our curious druid whether to check out a very sketchy haunted manor. Eventually they went in and there was a huge battle, druid (against the rest of the team) poured blood into a well and a demon appeared. Half the party started running, other half was fighting with the demon minions, the demon started bargaining with the druid for his soul, barbarian started to run but after 2 turns turned back and tried to fight the demon while the rest of the party also went back and disabled druid to save him.

Eventually the party escaped from the manor (killing the demon was out of the question, it was obviously too powerful).

Our DM was always telling us he never balances the encounters - it's on us to escape in time :) And there were MANY dead PCs. Some players were on their 3rd character by the end of 1.5 year campaign.

Resurecting a dead PC was a major plot point - we had a noble lady from one of the most important NPC noble houses in our party, and she died. We tried resurecting her, but one of the players had to sacrifice something (we rolled d100 and only that player managed over 95, to persuade the god to resurect the character he had to promise to become a priest of that god - and he did multiclassed because of this). The god was basically Loki, and he tricked us - the player character was ressurected as undead :)

protocolture a day ago [ - ]

>Balance is hard mostly because the possible results are too binary. Either you win or you're dead. Stakes are so high that you tend to err on the safe but boring or you risk TPK.

Fairly solved problem outside of DND tbh.

ajuc a day ago [ - ]

I don't think so, you can do this well in D&D, and you can do it badly in narration-focused RPGs with mechanics allowing for various degrees of success. It's more about DM ability to let the party "fail forward" than about any particular system.

Yes it's easy to fall into the traditional boring "here's 5 orcs, now fight to death" trap, but it's more about the D&D culture than about the system itself.

D&D wasn't my first system (it wasn't popular in Poland - we mostly played WFRP) and I certainly did fall into that trap a lot early on.

protocolture 14 hours ago [ - ]

In my experience DND has a very thin edge where the good experience exists. Outside of that its quite variable.

I cant comment on Warhams but other RPG's I have played dont really care about encounter difficulty. And I am not even talking about narrative only stuff. But even savage worlds I can go and find 10 wild cards or 20 wild cards or come up with almost any Ace, and its just a matter of player approach as to whether the encounter is difficult or simply deadly. To really create a TPK situation you would need to design enemies specifically to do that.

gavmor 10 hours ago [ - ]

I've been using an LLM as DM Assistant for almost a year, now, by way of EchoDelphi[0], a tool we built at a public weekend hackathon[1].

It listens to the game, then sends the transcript for inference. I've been prompting the LLM to generate "threats" and "setbacks" based on the current scenario, and that way I always have something handy when the dice[2] indicate fortunes' turn.

I use this for about three hours a week, 50+ weeks in a row. Pretty good for a weekend hack!

0. Https://EchoDelphi.github.io

1. Thank you Hacker Dojo, Mtn. View

2. https://3.bp.blogspot.com/-FyXSpsn-z28/WgfgacmVdtI/AAAAAAAAA...

Kichererbsen a day ago [ - ]

I like to use LLMs to create random tables that I then use to spark my imagination. You can easily get o1 to create a bunch of random tables on a topic. Create multiple columns that you can then mix and match results from - or just roll multiple times on the same column.

I also find the image generation great for getting a "feel" for what a room / location might look like so I can improvise describing it a bit better.

LLMs are _really_ good at finding words that "go well together" - so use it to come up with a bunch of words to describe the atmosphere of a particular location or the mannerisms of a particular NPC.

protocolture a day ago [ - ]

I used GPT 3 to help write my halloween one shot a few years ago.

The outcome was pretty good. As a language model it had quite a good understanding of tone and context. In fact it ended up adding (without my notice, I was quite tired) a whole layer of subtext, after the game my players were like "Ok now explain the goo!" and it turns out, that in 2 handouts, and 3 scene descriptions GPT had added linked, veiled descriptions of some space goop that I hadnt planned for.

ofirg a day ago [ - ]

I'm building a game with a similar idea. encounters are controlled by the AI. there is a classic RPG system built around it and human generated content for the world and story.

ormintos a day ago [ - ]

Sounds like an interesting mix. When you say encounters, do you mean the AI controls the generation of the opponente the party fights or the behaviour of the opponents?