I gave Codex 5.4 Playwright MCP access to the site and a prompt of "Use Playwright CLI Skill to open https://playstarfling.com/ and load the game. Work out how to play it, and devise a strategy to win." After a about half a dozen attempts it had figured the game out. Then I prompted it to "Score as much as you can." It wrote itself an auto-play script that just keeps going.
I stopped it running at 10866. That's currently the high score. I appreciate that this is pointless and proves nothing, but I've been experimenting with automating testing games (I work at a gaming company at the moment) so it felt like an opportunity to try an experiment.
Are you sure the script is actually testing the gameplay? Given it can see the entire source code of the game.
It started out using browser.click events and then switched to using browser.evaluate script injection. That's entirely valid for my use case.
Do tell? How did it play the game, did you watch? Just took forever with every shot, or how did that play out with the LLM induced latency?
It didn't exactly play it using the LLM, but it used Playwright to execute code in the browser to work out how it works and then wrote a script to inject into the page to play it. It was basically perfect AI getting skip * 2 on every shot even after a hundred planets. I didn't expect it to do quite so well with only 2 prompts.