I think this is a cool tech demo. But the commonality I see in all of these "let the agent run free" harnesses is that the output is never something I would want to use/watch/play.

I think minimizing the amount of human effort in the loop is the wrong optimization, and it's the reason we end up with "slop".

It's the dream of a lot of people to have a magic box that makes you things you can sell, or enjoy for personal leisure. But LLMs are not the magic box. And there may not ever be a magic box. The sooner we can accept that the magic box isn't in the room with us, then the sooner we can start getting real utility out of LLMs.

TLDR: Human taste is more important than building things for the sake of building them.

Maybe OP could try an angle where at various points, the process presents the user with 2-6 options, and they choose their favourite. With a bit of intentional chaos in there, the user and tool could potentially discover interesting game concepts and eventually build them as prototypes.