Hacker News

Congrats on the launch. I've been fooling around with using my pipecat MCP(https://github.com/pipecat-ai/pipecat-mcp-server) with WebRTC. The WebRTC is hooked into a Webapp interface and this allows me to "talk" to different containers(projects) on my truenas.

I have just a list of chat sessions on the web app on all my projects. The webapp is modified to launch claude code daemons (borrowed from humanlayer/codelayer) and exposes the outbound STT from the WebRTC into a chat session.

- MCP Auth is via auth0

- Webapp itself is gated by a Bearer token.

This itself gets me pretty far. I am not sure what more this is offering?

My TTS/STT models are local by Kyutai and the voice agent's LLM between STT and TTS is used to determine some basic context: e.g. what project directories, mcp servers to select and what skills to use for launching the daemons.

Good to know its similar. Oh I actually do have a text box as well, but using it to type from the phone is not very convenient. Too much typing, I generally STT into the text box. I don't use it to code much, unless I have specced it out and I know the spec is good. But then to code it up is just a few mins, no?

I spend my time trying to tuning the voice+webapp experience: i.e. how it can explain things, can it surface thinking tokens from claude tools properly etc. The sweat, blood, voice go into `/create_research -> /create_plan` loop before the `/implement_plan`. Sometimes I copy the research and paste it into chatGPT for review or comments as well.

I generally use the MCP to get it to follow commands and explain things to me to make progress in this cycle, and I often pause it and ask for drawing me a mermaid a sequence diagram for events or a block diagram showing how pieces go together.