Will need to wait for real benchmarks, but based on OpenAI marketing Instant is their latency optimized offering. For voice interface, you don't actually need high tok/s because speech is slow, time to first token matters much more.
Will need to wait for real benchmarks, but based on OpenAI marketing Instant is their latency optimized offering. For voice interface, you don't actually need high tok/s because speech is slow, time to first token matters much more.