Willy from Twill here.
I love the idea of emailing agents like we email humans! Thank you for sharing your learnings:
1. Network constraints vary quite a bit from one enterprise customer to another, so right now this is something we handle on a case-by-case basis with them.
2. We came to the same conclusion. For sensitive credentials like LLM API keys, we generate ephemeral keys so the real keys never touch the sandbox.
3. Totally right, we support constrained tasks too (ask mode, automated CI fixes). We've gone back and forth on whether to go vertical-first or stay generic. We're still figuring out where the sweet spot is. The constrained tasks are more reliable today, but the open-ended ones are where teams get the most leverage.