The article just isn't that coherent for me.
> when a new model is released as the SOTA, 99% of the demand immediately shifts over to it
99% is in the wrong ballpark. Lots of users use Sonnet 4 over Opus 4, despite Opus being 'more' SOTA. Lots of users use 4o over o3 or Gemini over Claude. In fact it's never been a closer race on who is the 'best': https://openrouter.ai/rankings
>switch from opus ($75/m tokens) to sonnet ($15/m) when things get heavy. optimize with haiku for reading. like aws autoscaling, but for brains.
they almost certainly built this behavior directly into the model weights
???
Overall the article seems to argue that companies are running into issues with usage-based pricing due to consumers not accepting or being used to usage based pricing and it's difficult to be the first person to crack and switch to usage based.
I don't think it's as big of an issue as the author makes it out to be. We've seen this play out before in cloud hosting.
- Lots of consumers are OK with a flat fee per month and using an inferior model. 4o is objectively inferior to o3 but millions of people use it (or don't know any better). The free ChatGPT is even worse than 4o and the vast majority of chatgpt visitors use it!
- Heavy users or businesses consume via API and usage based pricing (see cloud). This is almost certainly profitable.
- Fundamentally most of these startups are B2B, not B2C
> Lots of users use 4o over o3
How much of that is the naming?
Personally I just avoid OpenAIs models entirely because I have absolutely no way of telling how their products stack up against one another or which to use for what. In what world does o3 sort higher than 4o?
If I have to research your products by name to determine what to use for something that is already a commodity, you've already lost and are ruled out.
It's the naming. He is confusing 4o/4o-mini with o4-mini, the latter is a pretty strong model and it's also one of the newest. Oh and it's cheaper than the non-mini 4o.
There's both a 4o and an o4? And they're different?
Yes. 4o is a non-CoT model that is the continuation of the GPT-4 line, itself superseded by 4.1. o4 is the continuation of the CoT model line.
There's also 4o-mini and o4-mini...
No, I meant 4o over o3. For a ton of people a reasoning model's latency is overkill for them asking for inspiration on what to make for dinner.
o4-mini isn’t really that great in comparison to o3, and I still use o3 as my daily driver for reasoning tasks. I don’t really have a purpose for o4-mini, not even for coding tasks.
> In fact it's never been a closer race on who is the 'best'
Thank you for pointing out that fact. Sometimes it's very hard to keep perspective.
Sometimes I use Mistral as my main LLM. I know it's not lauded as the top performing LLM but the truth of the matter is that it's results are just as useful as the best models that ChatGPT/Gemini/Claude outputs, and it is way faster.
There is indeed diminished returns on the current blend of commercial LLMs. Deep seek already proved that cost can be a major factor and quality can even improve. I think we're very close to see competition based on price, which might be the reason there is so much talk about mixture of experts approaches and how specialized models can drive down cost while improving targeted output.
If you're after speed, Groq is excellent. They've recently added Kimi K2.
Yeah, my biggest problem with CC is that it's slow, prone to generating tons of bullshit exposition, and often goes down paths that I can tell almost immediately will yield no useful result.
It's great if you can leave it unattended, but personally, coding's an active thing for me, and watching it go is really frustrating.
I can't deal with any of the in editor tools. I'd love something that handled inputting changes (with manual review!) while still giving me 100% control over the context and actually doing as its told.
[dead]