I might suggest looking at Alibaba's open source AgentEvolver. It doesn't specifically target video games, but it's an agentic system designed around a more OODA loop evolutionary system than the kind of train/inference system, has potential, could be exciting to see.
I like how they classifythr sub problems of their work. Environment/ self questioning -> task / self questioning -> trajectory / self evaluation. OODA-esque.
https://arxiv.org/abs/2511.10395 https://github.com/modelscope/AgentEvolver with thanks to Sung Kim who has been a great feed https://bsky.app/profile/sungkim.bsky.social/post/3m5xkgttk3...