Congrats on the launch. I've been fooling around with using my pipecat MCP(https://github.com/pipecat-ai/pipecat-mcp-server) with WebRTC. The WebRTC is hooked into a Webapp interface and this allows me to "talk" to different containers(projects) on my truenas.

I have just a list of chat sessions on the web app on all my projects. The webapp is modified to launch claude code daemons (borrowed from humanlayer/codelayer) and exposes the outbound STT from the WebRTC into a chat session.

- MCP Auth is via auth0

- Webapp itself is gated by a Bearer token.

This itself gets me pretty far. I am not sure what more this is offering?

My TTS/STT models are local by Kyutai and the voice agent's LLM between STT and TTS is used to determine some basic context: e.g. what project directories, mcp servers to select and what skills to use for launching the daemons.

This sounds solid, similar stuff to what we do! Sounds like this setup gets you most of the way there. We also have a mobile app + notifications. And I haven't tried using a coding voice agent via MCP, I'll try that out soon!

Good to know its similar. Oh I actually do have a text box as well, but using it to type from the phone is not very convenient. Too much typing, I generally STT into the text box. I don't use it to code much, unless I have specced it out and I know the spec is good. But then to code it up is just a few mins, no?

I spend my time trying to tuning the voice+webapp experience: i.e. how it can explain things, can it surface thinking tokens from claude tools properly etc. The sweat, blood, voice go into `/create_research -> /create_plan` loop before the `/implement_plan`. Sometimes I copy the research and paste it into chatGPT for review or comments as well.

I generally use the MCP to get it to follow commands and explain things to me to make progress in this cycle, and I often pause it and ask for drawing me a mermaid a sequence diagram for events or a block diagram showing how pieces go together.