The more fundamental bottleneck is not even the frontier models, it's the datacenters. Let's say Europe breaks apart from the US completely tomorrow. It does not have enough datacenters (or GPUs in general) to sustain its inference needs even if it would resort to Chinese open models. And to build new datacenters, it would need to source parts from the US and China.

In other words, if AI does have continued significant economic impact, only the US and China would be able to leverage it completely. The rest of the world is implicitly betting that AI won't be good enough, or that eventually the compute curve flattens out so using a model that is 10x larger only leads to marginal benefits.

[deleted]

> The more fundamental bottleneck is not even the frontier models, it's the datacenters.

Is it even though? Quantization and speculative decoding are improving the local AI story by leaps and bounds every month.

Speculative decoding is not that useful at scale, it's mostly about making local single-user inference faster. When you're batching multiple inferences together, that's already as fast as the verification you have to perform w/ speculative decoding.

The future will have LLMs running local at your laptop/devices. If not almost exclusively then at least for 90-95% of the tasks. Speculative decoding is just one technique out of many existing and more to come that will make this even more viable. The gap is closing on both fronts. Software gets faster/more clever. Hardware gets faster and smaller. The single user story is the story. I'm obviously speculating myself, but that's how I see it.

There is "local AI" which is running on consumer grade hardware and "local AI" which still needs a datacenter (DeepSeek 4, GLM 4.7, etc). If you woke up tomorrow and could only use the latter you are about 6 months behind the frontier, if you have to rely on the former you are 2 or 3 years behind.

All these tricks like quantization and speculative decoding can also be used by the leading AI labs, which means they will simply have more compute than you at the end of the day. So far this has translated into better performance.

Nothing released so far inherently "needs" a datacenter, it's just a matter of how much performance you require. Slow, high-latency inference will be a natural way to run "datacenter" models locally.

Yes it does. You will not be able to run models like DeepSeek v4 (>1.5 trillion parameters) on a regular workstation any time soon, unless by "slow" you mean "unusable". And those are the models that are ~6 months behind Opus 4.7.

[dead]

but ASML is in Europe - so they hold at least some critical part of the stack.

In theory yes. They've got a bargaining chip with TSMC. But it's unclear how much use that would be without a safe shipping route between Europe and Taiwan and/or a navy capable of maintaining such.