I have a few questions about test frameworks that use AI services like this.
1)The examples always seem very generic: "Test Login Functionality, check if search works, etc". Do these actually work well at all once you step outside of the basic smoketest use cases?
2) How to you prevent proprietary data from being read when you are just foisting snapshots over to the AI provider? There's no way I'd be able to use this in any kind of real application where data privacy is a constraint.
Good questions!
1) Beyond basic tests: You're right to be skeptical. This is best for quick exploratory testing during local development ("does my new feature work?"), not replacing your test suite. Think "scriptable manual testing" - faster than writing Playwright manually, but not suitable for comprehensive CI/CD test coverage.
2) Data privacy: Screenshots stay local in /tmp, but console output and page content Claude writes tests against are sent to Anthropic. This is a local dev tool for testing with dummy data, not for production environments with real user data. Same privacy model as any AI coding assistant - if you wouldn't show Claude your production database, don't test against it with this.
Thanks. I keep seeing silver bullet testing solutions pitched left right and center and wondering about these two points. Glad to see a project with realistic boundaries and expectations set. Would definitely give this a shot if I was working on a local project.
Another llm-generated response. This is sad.
For 2, a lot of companies use AWS Bedrock to access Claude models instead of Anthropic, for exactly this reason. Amazon’s terms say they don’t log prompts or completions and don’t send anything to the model provider. If your production database is already hosted by AWS, it doesn’t seem like much additional risk.
> Do these actually work well at all once you step outside of the basic smoketest use cases?
Excellent question... no, beyond basic kindergarten stuff playwright (with AI) falls quickly apart. Have some OAuth? Good luck configuring playwright for your exact setup. Need to synthesize all information available from logs and visuals to debug something? Good luck..