We use agents to navigate the app, making real-time decisions based on its state. I prefer to compare it more to a manual QA engineer than to static e2e tests. We spent a lot of time on the harness to make sure the results are reliable. This allows you to assert on dynamic content like AI-generated content. We also support validation of email flows since the agent can read its own email.
Fable (rip) is absurdly good at this, great time to build a product around it, you definitely need the harness, but it feels like it just turned the corner to be able to do really in depth and edge case work.
Do you handle heterogenous environments and network connectivity simulation as well? I am working on a mobile app and occasionally having users just lose a request or two can put the state machine into unusual modes.
I feel like new AI model releases will only allow our agents to do more in-depth testing; the space still has a lot of room to grow. Quality assurance is way more complicated than just clicking around a UI.
Regarding the other question: not yet. For now, we have Chromium, iOS, and Android (latest versions of each), but we are working on adding more. Regarding network connectivity, it's coming soon (I have an open PR).