Shy of an algo breakthrough, open source isn't going to catch up with SOTA, their main trick for model improvement is distilling the SOTA models. That's why they they have perpetually been "right behind".
Shy of an algo breakthrough, open source isn't going to catch up with SOTA, their main trick for model improvement is distilling the SOTA models. That's why they they have perpetually been "right behind".
They don't need to catch up. They just need to be good enough and fast as fuck. Vast majority of useful tasks of LLMs has nothing to do with how smart they are.
GPT-5 models have been the most useless models out of any model released this year despite being SOTA, and it because it slow as fuck.
For coding I don’t use any of the previous gen models anymore.
Ideally I would have both fast and SOTA; if I would have to pick one I’d go with SOTA.
There a report by OpenRouter on what folks tend to pay for it; it generally is SOTA in the coding domain. Folks are still paying a premium for them today.
There is a question if there is a bar where coding models are “good enough”; for myself I always want smarter / SOTA.
FWIW coding is one of the largest usages for LLM's where SOTA quality matters.
I think the bar for when coding models are "good enough" will be a tradeoff between performance and price. I could be using Cerebras Code and saving $50 a month, but Opus 4.5 is fast enough and I value the piece-of-mind I have knowing it's quality is higher than Cerebras' open source models to spend the extra money. It might take a while for this gap to close, and what is considered "good enough" will be different for every developer, but certainly this gap cannot exist forever.
I just use a mix of Cerebras Code for lots of fast/simpler edits and refactoring and Codex or Claude Code for more complex debugging or planning and implementing new features, works pretty well. Then again, I move around so many tokens that doing everything with just one provider would need either their top of the line subscriptions or paying a lot per-token some months. And then there's the thing that a single model (even SOTA) can never solve all problems, sometimes I also need to pull out Gemini (3 is especially good) or others.
> just need to be good enough and fast as fuck
Hard disagree. There are very few scenarios where I'd pick speed (quantity) over intelligence (quality) for anything remotely to do with building systems.
If you thought a human working on something will benefit from being "agile" (building fast, shipping quickly, iterating, getting feedback, improving), why should it be any different from AI models?
Implicit in your claim are specific assumptions about how expensive/untenable it is to build systemic guardrails and human feedback, and specific cost/benefit ratio of approximate goal attainment instead of perfect goal attainment. Rest assured that there is a whole portfolio of situations where different design points make most sense.
> why should it be any different from AI models?
1. law of diminishing returns - AI is already much, much faster at many tasks than humans, especially at spitting out text, so becoming even faster doesn’t always make that much of a difference. 2. theory of constraints - throughput of a system is mostly limited by the „weakest link“ or slowest part, which might not be the LLM, but some human-in-the-loop, which might be reduced only by smarter AI, not by faster AI. 3. Intelligence is an emergent property of a system, not a property of its parts - with other words: intelligent behaviour is created through interactions. More powerful LLMs enable new levels of interaction that are just not available with less capable models. You don’t want to bring a knife, not even the quickest one in town, to a massive war of nukes.
I agree with you for many use cases, but for the use case I'm focused on (Voice AI) speed is absolutely everything. Every millisecond counts for voice, and most voice use cases don't require anything close to "deep thinking. E.g., for inbound customer support use cases, we really just want the voice agent to be fast and follow the SOP.
If you have a SOP, most of the decision logic can be encoded and strictly enforced. There is zero intelligence involved in this process, it’s just if/else. The key part is understanding the customer request and mapping it to the cases encoded in the SOP - and for that part, intelligence is absolutely required or your customers will not feel „supported“ at all, but be better off with a simple form.
As a customer when confronted with such a system I hang up and never ever do business with that company again. Regardless of polish, they're useless.
What do you mean by "such a system"? One that uses AI to funnel your natural language request into their system of SOP? Or one that uses SOPs to handle cases in general? SOP are great, they drastically reduce errors, since the total error is the square root of the sum of squares of random error and bias – while bias still occurs, the random error can and should be reduced by SOPs, whenever possible. The problem is that SOPs can be really bad: "Wait, I will speak to my manager" -> probably bad SOP. "Wait, I will get my manager so that you can speak to them" -> might be a better SOP, depending on the circumstances.
It never works. You always just get the digital equivalent of a runaround and there simply isn't a human in the loop to take over when the AI botches it (again). So I gave up trying, this crap should not be deployed unless it works at least as good as a person. You can't force people to put up with junk implementations of otherwise good ideas in the hope that one day you'll get it right, customer service should be a service because on the other end of the line is someone with a very high probability of being already dissatisfied with your company and/or your product. For me this is not negotiable, if my time is less valuable to you, the company, than it is to actually put someone on to help then my money will go somewhere else.
I'm still not sure if you're speaking of SOP in general or AI-interfaces to them. Why don't you answer that simple question before ranting on?
Speed is great for UI iteration or any case where a human must be in the loop.
As long as the faster tech is reliable and I understand its quirks, I can work with it.
> They don't need to catch up. They just need to be good enough
The current SOTA models are impressive but still far from what I’d consider good enough to not be a constant exercise in frustration. When the SOTA models still have a long way to go, the open weights models have an even further gap distance to catch up.
GPT 5 Codex is great - the best coding model around except maybe for Opus.
I'd like more speed but prefer more quality than more speed.
This. You can distill a foundation model into open source. The Chinese will be doing this for us for a long time.
We should be glad that the foundation model companies are stuck running on treadmills. Runaway success would be bad for everyone else in the market.
Let them sweat.
I'd prefer a 30 minute response from GPT-5 over a 10 minute Response from {Claude/Google} <whatever their SOTA model is> (yes, even gemini 3)
Reason is: while these models look promising in benchmarks and seem very capable at an affordable price, I *strongly* felt that OpenAI models perform better most of the times. I had to cleanup Gemini mess or Claude mess after vibe coding too much. OpenAI models are just much more reliable with large scale tasks, organizing, chomping tasks one by one etc. That takes its time but the results are 100% worth it.
I get GPT 5.2 responses on copilot faster than for any other model, almost instantly. Are you sure they’re slow as fuck?
Confused. Is ‘fuck’ fast or slow? Or both at the same time? Is there a sort of quantum superposition of fuck?
It's an intensifier
Wasn't that supposed to be 'ass'
Then how would double intensifier look like?
well, it's not slow as fuck! it's quick as lightning and speedy as hell
Bullseye.
Too bad, so sad for the Mister Krabs secret recipe-pilled labs. Shy of something fundamental changing, it will always be possible to make a distillation that is 98% as good as a frontier model for ~1% of the cost of training the SOTA model. Some technology just wants to be free :)
We trust in our lord and savior China and Zuck to keep the peasants fed.
> their main trick for model improvement is distilling the SOTA models
Could you elaborate? How is this done and what does this mean?
I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though.
No, distillation is far older than deepseek. Deepseek was impressive because of algorithmic improvements that allowed them to train a model of that size with vastly less compute than anyone expected, even using distillation.
I also haven’t seen any hard data on how much they do use distillation like techniques. They for sure used a bunch of synthetic generated data to get better at reasoning, something that is now commonplace.
Thanks it seems I conflated.
Yes. They bounced millions of queries off of ChatGPT to teach/form/train their DeepSeek model. This bot-like querying was the "distillation."
They definitely didn't. They demonstrated their stuff long before OAI and the models were nothing like each other.
Why would OpenAI allow someone to do that?
They don't anymore. They introduced ID verification shortly after, but it's hard to stop completely while also scaling fast.
They didn't, but how do you stop it? Presuming the scale that OpenAI is running at?
[dead]