The Chinese models will not overtake the frontier US ones given the current way things are going. The US models derive their lead from incredible efforts to source more and higher quality (mostly synthetic data) via great feats (eg generating with humongous teacher models that could never feasibly serve interactive traffic). The Chinese models advance via heroic efforts to optimize models and great feats to secure more and higher quality training data from the US frontier models.
For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.
Unless you are working at one of these companies you don't know what they are doing.
You don't know what's happening in z.ai nor alibaba. And you don't know what's happening in anthropic and open ai.
I don't know what they are all doing, but I find it extremely unlikely that they are not all collecting data from one another. I am confident anthropic has a team going over GML 5.2 weights even if it's just to see where the competition is.
Just because some labs are getting data from Anthropic does not mean they are not also doing their own research.
They were focused on optimization because they could not get the best hardware.The only reason their top labs are behind may be because they did not have h200s and MI350s. And now they do.
Plus you are discounting other risks, Anthropic is currently sitting on "the best" models in the world because they got in a pissing match with the US administration.
btw: This could be the case in china as well, their administration has been surprisingly open on AI exports and open weight models, that we know of. There is a very small but not trivial chance they are hogging a better version of glm 5.2 for example, but no one is allowed to talk about it. Now I am not saying that is the case, I am saying the two cases (chinese labs are 6 months behind, they are forced to suppress their best models) are indistinguishable.
> Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data
Even if your characterization is accurate, they could do this tomorrow and are not so myopic that they wouldn’t have thought about it. I don’t see this as a barrier, and I see a lot of the same underestimation of Asia that’s been happening for 50 years. There’s not some innate American advantage to building LLMs, and personally I think whatever head start the US has is going to be squandered on delays from the export control “to dangerous for release” LARPing we’re seeing.
I am not sure which part you are interpreting as underestimation or whatever? Quite the opposite: I claim the difference arises from a difference in strategies, not from intrinsic differences in ability.
Also I was responding to a claim about what will happen in less than 6 months (that’s about the edge of what you can meaningfully say too much about in this field).
These strategies take materially different resources; it’s not an overnight decision made by leadership. I suppose there is a natural experiment ongoing at Meta regarding this, it seems they recently moved a number of people into a division to produce such data overnight. So we will find out soon how quick they climb the leaderboards.
Exactly. If they wanted to they could produce the same amount of data. Companies like Scale, Mercor, Surge exists for a reason, a reason that doesn't need to exist in China if they mandate Chinese enterprises to provide all their real world data (or have them work inside RL environments) to the model companies for post training. There is no real advantage that US companies have except a head start, and as Jensen said, a ton of the research advantage is skewed since a lot of the best researchers in the US are Chinese nationals. I do think the model is just one piece of the pie (not to echo Jensen too much), and hopefully we will always be able to serve these bigger frontier models in a much more efficient way as well as building out the application layer faster which actually makes them useful and/or more dangerous/powerful.
Why would those have any impact on R&D speed? Most are funded and close to cash flow positive
The amount of data Anthropic has claimed was extracted for distillation is tiny in comparison to the entire internet, which is right there for the taking and holds most of the knowledge people expect models to have.
Distilling even with small amounts of data from a better model is still helpful, but not in the sense of transferring capabilities the raw internet-trained model doesn't have at all, but for identifying those capabilities that are compatible with the servile assistant persona and suppressing others that are undesirable (e.g. trolling). A primitive version of this were instruction-tuning datasets generated with ChatGPT, as used e.g. for Alpaca.
Without a clear target to emulate, competitors might have to rely more on human raters, but there are plenty of data labeling companies in China, so that's hardly a hurdle.
I think you are making a distinction between pre training and later stages? The value on eg Fable output is exactly the careful preference optimization embedded in those responses. Not all data is the same (sorry if my first comment was sloppy on that).
“China can only copy the US” is a very short sighted and uninformed opinion. there is more coming out of china than just new ways to distill models
I don’t know how anyone can look at the innovation going on at DeepSeek and come to the conclusion that China can only copy.
Distillation and copying are how they’ve bootstrapped their models, but that feels not so different than Anthropic and Meta torrenting millions of pirated books.
The Chinese labs are solving problems for a different set of constraints.
How so? You'll soon have your choice of a very old OAI model or a new Chinese model, because the USG has no interest in letting you access the newest models without explicit permission.
Their point is that the Chinese models will also me limited to the very old OAI models, unless things flip. as they said.
The use of US models for Chinese model training is part of the motivation of all of this.
Apologies - I was too quick in my response. I was speaking from a "how the users will perceive it" point of view. China's pretty good at the internet reputation thing.
I don’t think anyone seriously believes any of the Chinese models are ever going to “overtake” the American frontier models. I doubt that that’s even their goal.
But if they can stay on pace, within say 6 to 12 months of the bleeding edge of the American frontier models, that’s a huge problem.
If they can just piggyback on the Herculean efforts of Anthropic, OpenAI, Google etc., accept a little bit of lag, and save billions of dollars? Why wouldn’t they?
And for the end user, why would they pay a premium subscription price for something they can just wait six months for and run on their own hardware at home? In my opinion, this is the cat and mouse game that’s being played right now. And I suspect it’s intentional on the side of the open weight models. I would bet they are playing a war of attrition
Coding a case where it's possible to programmatically generate large amounts of data relatively cheaply. China could realistically surpass the US in coding while still being behind in many other areas.
Also worth noting that China has more data to work with in general having a much bigger population.
Chinese frontier models don't need to catch up in every category. They just need to win in coding and that's exactly where they are going. The gap went from 12+ months to 1-2 months with the latest release of GLM 5.2 and coding is a task that you don't need heroic efforts to find rare and long-tail training data, you can just outsmart your competitor by optimizing algorithms and training recipes. This is something they can do at scale with the money and talent pool.
> They just need to win in coding and that's exactly where they are going.
They don't even need to 'win' in the sense of maxing the benchmark. They can be 20% worse/50% cheaper and many of us (and our managers who approve our token budgets) will be in.
Deepseek is 30x cheaper for input/75x cheaper for output than sonnet on openrouter, and it's not a whole lot worse for many things.
Anthropic/OpenAI's valuations are built on assumption of capturing most of the market and having the pricing power to jack up prices for tokens.
It is enough to kneecap their pricing power to trigger the valuation reset by an order of magnitude and humble them a bit.
Plus there are always infrastructure and hardware providers who want to keep their share of profits and will squeeze Anthropic's margins to deflate their valuation (nvidia, aws, RAM manufacturers, etc)
This seems wildly naive. This entire field is like 4 years old. We have quite frankly no idea about what things will look like in 4 more years.
The article makes a very specific claim with a clear deadline less than 6 months ago. I do not underestimate the Chinese labs and their capabilities, if they wish they can retool to start overtaking the US labs with a different strategy. My comment shouldn’t be read as a permanent impossibility statement, just an observation on where we are right now. At the moment their strategy seems to be to produce decent quality, highly optimized models; and a pivot will take longer than 6 months to materialize into overtaking the frontier labs (that themselves do not look like they will throw the towel in in the next 6 months).
Yeah, this is, to be perfectly blunt, cope, for several reasons:
1. It's unclear if there is a law of diminishing returns with ever-larger models. They're more expensive to run and for many applications, you'll probably find smaller models are sufficient;
2. There's an inbuilt market for local LLMs. This is an effective limit on how large models can get. Case law hasn't been established yet on, for example, if a law firm using ChatGPT breaks privilege. Specifically, chat logs may be discoverable. Medical applications have this issue too and I think you'll find that financial firms are going to be leery about this as well;
3. Better, larger models will bleed into smaller, open source models. The chat logs themselves are training data. There's a whole market in China for Claude tokens around this;
4. China has a national security interest in not being beholden to US tech giants when it comes to AI. China has a history of being able to commit to large-scale long-term projects and Anthropic just won't be able to compete with a national project by one of the world's superpowers, if it comes down to it;
5. Winning doesn't necessarily mean being the best. Often it's just being good enough;
6. As an example of a national project, China is busy replicating EUV because of the US ban on ASML and NVidia exporting their best stuff. I don't think many in the West are prepared for how rapid this will be. I'm reminded of the policy debate in 1945 when many in American policy and militarey circles thought the USSR would never catch up with atomic bomb or, if they did, it would take 20+ years. It took 4 years. For the hydrogen bomb, it took 1. The US hardware advantage is a lot more tenuous than many realize.
> source more and higher quality (mostly synthetic data)
Kind of an oxymoron don’t you think.
If they could generate data that looked kind of real, why don’t they just generate that data on the fly during inference