> > If OpenAI or Anthropic could squeeze the same output out of smaller GPUs and servers they'd be doing it for themselves.
> First, they do this; that's why they release models at different price points.
No, those don't deliver the same output. The cheaper models are worse.
> It's also why GPT-5 tries auto-routing requests to the most cost-effective model.
These are likely the same size, just one uses reasoning and the other doesn't. Not using reasoning is cheaper, but not because the model is smaller.
But they also squesed a 80% cut in O3 at some point, supposedly purely on inference or infra optimization