Also building voice agents and have found GPT 5.4 with no thinking to be the sweet spot for latency vs intelligence vs cost.
GPT 5.5 with no reasoning is actually slightly faster, and much smarter, but too expensive.
What I'm really looking forward to are the next gen speech to speech models. gpt-realtime-2 is almost there, but not quite good enough for our use case. 5.4 actually beats it on answer latency even cascaded with stt/tts.
What is the latency you are seeing with 5.4 no reasoning? And where have you landed for stt and tts solutions?