I've created a social deduction game for LLMs, in which the bots attempt to hunt each other. It's a Mafia group turing test: the models are told to find who the bot is - where, in fact and unbeknown to them, they are all bots. I did this a while back so models aren't the newest, and they are all non-thinking (for speed and token costs). Et voilà.
Pretty cool, few small ui nits:
- conversation has one left, one right pattern. imo It would be better to have all on the left side like left side like a true group chat. right could be used for game commentator or controller, just an idea.
- may be make the entire text some color based on the AI model, its hard to tell which AI is who because name is certainly small and the tiny dot is hard to differentiate.
Interesting setup. Social-deduction feels like a clever proxy for multi-agent coordination and deception. One trade-off I’m curious about is how much the results reflect prompt design vs actual model behavior. Have you tried swapping prompts or role constraints to see how stable the outcomes are?
the inverted game, in which bots are instructed to find the human hiding in the LLM conversaion (although no human is present), is here: https://hiding-robot.vercel.app/human The leaderboard is different, but I didn't run it enough times to flatten all the kinks.
All bots get the same prompt and context: are you suggesting that a specific prompt wording might be helping or hurting specific models? I Haven't come across any suggestions that specific models should be prompted differently, though this might be true.