Thanks for this, I made a tower defence a while ago and I had been considering applying an AI to the task of designing new waves and tuning hitpoints/speed/armour

It made me think that one of the things that it probably needs is a way to get a 'feel' for the game in motion. Perhaps a protocol for encoding visible game state into tokens is needed. With terrain, game entity positions, and any other properties visible to the player. I don't think a straight autoencoder over the whole thing would work but a game element autoencoder might as a list of tokens.

Then the game could provide an image for what the screen looks like plus tokens fed directly out of the engine to give the AI a notion of what is actually occurring. I'm not sure how much training a model would need to be able to use the tokens effectively. It's possible that the current embedding space can hold a representation of game state in a few tokens, then maybe only finetuning would be needed. You'd 'just' need a training set of game logs with measurements of how much fun people found them. There's probably some intriguing information there for whoever makes such a dataset. Identifying player preference clusters would open doors to making variants of existing games for different player types.