I'm skeptical of how fast "up to" 750t/s really means. Maybe if they make it extremely expensive so it frees up enough capacity?
GPT‑5.3‑Codex‑Spark currently runs on Cerebras chips and it's giving me around 150t/s. Still relatively very fast, but nowhere near the 1,000t/s they claimed at launch. (Also it's not a very good model.)
That said, I'm super bought in to faster models being better for most use cases than smarter models.
If it's 150 t/s, that's barely faster than Nvidia GPUs who are batching a lot more and are a lot more cost effective. Add in the Groq piece and Nvidia claims it can do 400 tokens/s.
Soon the bottleneck will be how fast your laptop can grep for a string.