There are 2 things worth separating.

1) China distills and is therefore morally bad.

As you rightly point out, that's not a great argument.

2) China distills and is therefore possibly not that competent.

I think that makes sense. If they only catch up to the frontier through distillation then 1) Their model will never be as good as the model they are distilling from. 2) They will never reach the frontier - they need someone else to do it first.

This is literally a repeat of the whole “China only make low quality cheap stuff” argument.

“All they do is copy.”

And now, oops they are world leaders in EVs, batteries, solar, drones, just to name a few on the biggest consumer facing things.

"Success leaves clues"

You gotta start somewhere and you can start at page 1 or page 10 and that time, energy and cost you saved starting 9 pages later can be put into making whatever it is you're building better than the original.

The US, and every other country, is full of derivatives or straight up copies. No one is getting super mad at the generic cheerios at the grocery store. It's hypocrisy.

>2) China distills and is therefore possibly not that competent.

I think deepseek at least has done enough innovative work that you could grant them a baseline of competency.

In general, there are enough papers coming out of China to suggest that there are quite a few people there who know what they are doing.

You're correct and I shouldn't have used the word competent. Perhaps "and is therefore not elite enough to be state of the art"?

I also have a soft spot for deepseek because they write such readable papers. I don't have a degree in anything but with a little work I can understand their papers - which I really appreciate.

But I still think my point stands - if you need distillation you won't be SOTA

Deepseek models are on the Pareto frontier of cost/performance. Thats the far more important one than just making a top scoring model.

> China distills and is therefore possibly not that competent.

I heard that argument more than one year ago, when chain of thought and reasoning cycles started to be hudden to protect against distillation.

Meanwhile, models as DeepSeek and MiMo are nothing short of excellent nowadays.

Ever since I switched away from OpenAI to DeepSeek I never felt the need to go back.

Deepseek Flash V4 really was a "holy shit" moment and deserves the praise/hype it's been getting from users. I have a multi-tier subscription strategy I've maintained for the last year of: 1. $20-$30 plan from first Claude now Codex for "SOTA" 2. Gemini via the extra $10/mo or so from my Google One plan 3. a cheap fallback plan.

Together it gives me plenty of head room/model performance for $40ish/mo, plus letting me compare the various models over time.

Originally I'd been using the Z.AI plan (that I'm still grandfathered into for <1 yr) as my cheap plan but wasn't keeping up with the SOTA progress and is slow/limited now. So I subscribed to the Opencode Go plan and use Deepseek Flash V4 almost exclusively and it is insane how much usage I can get for $10/mo.

I did the math on my Flash usage vs. what I'm paying Opencode and I'm typically not even exceeding $10 in API costs! So it's actually sustainable not rugpull pricing at least for me. I can pound it with requests/agentic loops and have it running for 30 min doing whatever the fuck and check back and have spent literal pennies for what would have cost $30+ on my work's Github Copilot plan.

I know enterprise world works under different rules and isn't price sensitive in the same ways as an individual but I truly don't see how this is sustainable for the US AI giants in the long term to maintain like 25x+ markup for 1.25x performance benefit.

IMO it does help explain the recent emphasis on secret, scary "super models" like Mythos to muddy the waters for decision makers with hype and FOMO at at time when companies are beginning to seriously scrutinize their token spending for the first time.