> At some point, you can let a less smart model hammer at a problem for longer and get to the same result, and as long as you are not involved it comes to the same thing.
Is that true? I find the smarter models can just be effective when smaller models can't. It isn't a matter of just waiting longer.
it's almost certainly not true yet but at some point there might be an equilibrium reached of speed Vs quality (and let's not forget, cost) where it's true for most of what you do.
Perhaps you'd still turn to hosted models for the hardest tasks, but most tasks go local. It does seem like that would make demand go down significantly.
Of course that's all predicated on model advances plateauing, or at least getting increasingly more expensive for incremental improvements, such that local open source models can catch up on that speed/quality/cost curve. But there is a fair amount of evidence that's happening. The models are still getting noticably better, but relative improvement does seem to be slowing, and cost is seemingly only going up.
Why is this presumed to be de facto inevitable:
* local compute isn’t scaling as before, so algorithmic improvements are the only ways models get meaningfully faster and smarter
* all those same algorithmic improvements would also be true for larger models
* hardware manufacturers have an incentive against local LLMs because cloud LLMs are so much more lucrative (+ corps would by desktop variants if they were good enough)
So no it’s not clear quality will ever be comparable. It may be good enough for what you want but there will always be a harder problem that you need to throw more compute and more memory at.
> It may be good enough for what you want but there will always be a harder problem that you need to throw more compute and more memory at.
Sure, but if the “good enough for what you want” consumes the vast majority of cases - data-center ai becomes just for the very extreme edge cases. Like how I can render a 4k rez video game at 60fps on my home pc, but if pixar wants to render their next movie they use data-center compute.
> all those same algorithmic improvements would also be true for larger models
Smaller models run faster. If ten runs of a small model gets me the same quality result as one run of the big model, and the small model runs 10x faster, then they are functionally the same.
> Like how I can render a 4k rez video game at 60fps on my home pc, but if pixar wants to render their next movie they use data-center compute.
This is a very nice analogy actually and it impacts the whole story about US vs. Chinese leadership in "frontier AI".