Cool demo but without tool calling this is basically a fast parrot. The traditional pipeline is slower but at least you can plug in a real brain.

voice to voice models can call tools. no need for TTS.