We were building voice agents across multiple platforms and had no way to regression test them. Prompt tweak on one node would silently break three other conversation paths. So we built voicetest.
It imports your agent config from any supported platform into a universal IR (AgentGraph), then runs autonomous multi-turn conversation simulations against it. A simulator LLM plays the caller, your agent graph handles the routing, and an LLM judge scores transcripts against success criteria. Also supports deterministic rule tests for compliance stuff, PII leakage, required disclosures, forbidden phrases.
Write tests once and they work across platforms. Import from Retell, export to VAPI, run the same test suite. Also does format conversion between platforms if you're migrating.
Has interfaces for CLI (CI/CD), web UI, REST API, and a TUI. Results go into DuckDB so you can query them. Uses LiteLLM via DSPy so it works w/whatever provider you want