Using gpt-5.4-mini in off-peak hours already feels like super-speed to me. That's probably no more than 100-150 tk/s. I can't imagine 750!

I've always eyed Cerebras but never had a use for it that would justify paying for the API directly. Although now that I think about it, trying out the API would probably cost less than a subscription for a month...

Try gpt-5.3-codex-spark - it's 1000 TPS and from my experience more capable than 5.4 mini.

If you have a subscription it's a different pool of usage.

Used it, very fast but tiny context window and doesn't have good reasoning. (good for quick simple code changes)

MIMO 2.5 Pro ultraspeed has a 1M window. 1,000 tok/sec is great for planning since you can have a rapid conversation with a lot of turns.

Agreed, 1000tok/s just fills up the context window (which is big by 2004 standards) super fast. But seems like 5.3-spark was just a taste of what’s to come.

2004 standards? O.o

In 2004, I took a class where we trained "language models" that were bigram word models, on an archive of a couple years of the Wall Street Journal.

I remember someone who literally announced they were dropping the class to the whole room at the end of a lecture, saying "This isn't AI!!!"

1904

Back when we were kids, we would get 0 tokens/sec _if we were lucky_

The ChatGPT subscription gives you access to the -spark model(s) in Codex which are blazing fast (but pretty dumb) which I think runs on Cerebras hardware too.

is this specifically in codex? have been trying to use the models for months on opencode then pi but it says chatgpt subscriptions don't have access to it - i was under the assumption that OpenAI doesn't lock down their models based on harness a la Claude Code

What plan are you on? It is only available to Pro users.

I have a pretty good use case for gpt-oss. The amount of time savings has actually been wild. Definitely worth a try. Just to be clear, it gets like 2000tok/s