> We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed.
This is really exciting. I work on voice AI, and we're still using 4.1/4.1 mini since none of the frontier models come close on latency. I'm excited to be able to have more interactive experiences, I think it'll unlock new ways of working with these models.
Also building voice agents and have found GPT 5.4 with no thinking to be the sweet spot for latency vs intelligence vs cost.
GPT 5.5 with no reasoning is actually slightly faster, and much smarter, but too expensive.
What I'm really looking forward to are the next gen speech to speech models. gpt-realtime-2 is almost there, but not quite good enough for our use case. 5.4 actually beats it on answer latency even cascaded with stt/tts.
What is the latency you are seeing with 5.4 no reasoning? And where have you landed for stt and tts solutions?