> It may be good enough for what you want but there will always be a harder problem that you need to throw more compute and more memory at.

Sure, but if the “good enough for what you want” consumes the vast majority of cases - data-center ai becomes just for the very extreme edge cases. Like how I can render a 4k rez video game at 60fps on my home pc, but if pixar wants to render their next movie they use data-center compute.

> all those same algorithmic improvements would also be true for larger models

Smaller models run faster. If ten runs of a small model gets me the same quality result as one run of the big model, and the small model runs 10x faster, then they are functionally the same.

> Like how I can render a 4k rez video game at 60fps on my home pc, but if pixar wants to render their next movie they use data-center compute.

This is a very nice analogy actually and it impacts the whole story about US vs. Chinese leadership in "frontier AI".