Hacker News

Incredibly I didn't do anything, I just told codex to use playwright cli, told it what to check (in plain English), and it did its thing. Looking at its log I can see that it was "playing" the game and defining its own test conditions, such as if the player/NPC is *not* on one of the "collidable" tiles, if the NPC is "going over the edge" of a collidable area, if it's facing the wrong way, etc. Sometimes it found bugs, e.g. it was testing for gravity checks and then it found that one of the movements was not working correctly and it went ahead and fixed it.

So essentially it uses CLI to read all the x,y coordinates, speed, timing, it took screenshots, and combined those together.

My learning from this is - just let the agent do it. Actually trying to interfere with specific conditions and checks lowers the agent's performance. Simply give it a guide.