Neat! How do you handle state changes during tests, for example, in a todo app the agents are (likely) working on the same account in parallel or even as a subsequent run, some test data has been left behind or now data is not perhaps setup for a test run.
I’m curious if you’d also move into API testing too using the same discovery/attempt approach.
This is one of our biggest challenges, you're spot on! What we're working on taking this includes a memory layer that agents have access to - thus state changes become part of their knowledge and accounted for while conducting a test.
They're also smart enough to not be frazzled by things having changed, they still have their objectives and will work to understand whether the functionality is there or not. Beauty of non-determinism!