Interesting approach using Signal for the transport layer. I've been working with real-time audio pipelines (chrome.tabCapture → Whisper) and the latency tradeoff between STT chunk size and accuracy is always tricky. What's the end-to-end latency like on a video call?