Oh this is sick! I've been complaining about this exact problem for years. The "canary without real users" idea is brilliant - way better than just throwing your free tier users under the bus and hoping for the best.

The thing that really got me was catching bugs in non-deterministic output. We've been struggling with this on LLM features where traditional assertions just don't work. Having agents actually judge quality instead of looking for exact matches is such an obvious solution in hindsight.

Quick question though - how do you handle auth flows with MFA or OAuth redirects?