This is actually interesting. Feels like we’re moving from “generate UI” to “validate UI,” which is a completely different problem. Curious how you handle edge cases where something looks correct but breaks in interaction?

The agent drives interactions through proofshot exec — clicks, typing, navigation and each action gets logged with timestamps synced to the video. So in the viewer you can scrub through and click on action markers to jump to specific moments. It captures what happened during interaction, not just what the page looked like at rest. I had recordings where the agent struggled (for instance when having to click toggle buttons). It was fascinating to watch, the agent just tried again and again like a toddler figuring out how to use a keyboard and after 3 tries figured it out on his/her own (trying not to misgender the babies of future AGI).

...you test the interaction too? That's what Playwright does and LLMs are pretty capable of writing playwright tests for interaction.