I have a draft doing this with text adventures: https://entropicthoughts.com/updated-llm-benchmark