They actually explained this a few days back (can't seem to find the link right now). But, the core explanation part was it's architecture.
1. MoE (nothing new here, but, this helps a lot)
2. Compressed Attention Mechanisms (this is their core innovation) - this dramatically reduces the Key-Value (KV) cache requirements for longer contexts
Another thing that helps is significantly lower energy costs in China.
Another point from my own guess: they are running (some percentage) the inference on their own home-grown AI inference chips.
Their models are organized around inference efficiency from the start, it's what they're focusing on. Also they come from HFT and are good at low-level optimization. For v3, they've been literally reverse engineering Nvidia GPUs for undocumented behavior that helped against memory bottlenecks, writing file systems for efficient model serving, and doing a ton of low-level grunt work in the times where everyone else just relied on torch. Being compute-constrained helped as well - necessity is the mother of invention.
What makes most hardware companies fail at software, for example? AI shops are usually run by ML people, succeeding at unrelated areas of expertise is hard for any organization.
But surely Google has both ML people and people expert at optimising stuff, be it hardware or software. In my opinion they have the talent, the sheer number of employees and the capital. Can deepseek really have people much more talented at optimizing stuff?
No I don't think they can, but then Google literally has their own custom inference hardware that they target so ... yeah 3.5 flash is extremely pricey compared to v4 pro and now I'm wondering why that would be. It's difficult to imagine they don't care given we know they're prepared to pay $2B / mo for additional GPU capacity.
The answer is a lean team that is also resource constrained. This not only fosters creativity, but also reduces bloat. People heavily underestimate how much inefficiencies(bloat) heavy bureaucracy adds.
To us, outside of the US, it was pretty obvious from day 1 of US chip-related sanctions on China that it will actually end up benefitting them more than punishing them.
Just wait till they flood the market with dirt-cheap GPU chips. And these are coming.. pretty soon.
That is a very good question. It is open source / open weight - yet none of the third party providers, that also host Deepsek, seem to be able to match Deepseek itself on price.
My guess is that they do aggressive caching / some proprietary optimizations in their hosting setup that they haven't published. Maybe also running at loss to gain market share.
And judging from latency / network performance, I don't think what you access, when you access deepseek.com from Europe, is hosted in China.
It's clear to me they are subsidizing inference in exchange for market share, and doing it at this scale makes the most sense if their target is getting more user data. Note that this sort of pricing isn't far off from the equivalent token-based pricing of ChatGPT or Claude subscription plans, which are more clearly subsidized by the user's data.
They actually explained this a few days back (can't seem to find the link right now). But, the core explanation part was it's architecture.
1. MoE (nothing new here, but, this helps a lot)
2. Compressed Attention Mechanisms (this is their core innovation) - this dramatically reduces the Key-Value (KV) cache requirements for longer contexts
Another thing that helps is significantly lower energy costs in China.
Another point from my own guess: they are running (some percentage) the inference on their own home-grown AI inference chips.
Their models are organized around inference efficiency from the start, it's what they're focusing on. Also they come from HFT and are good at low-level optimization. For v3, they've been literally reverse engineering Nvidia GPUs for undocumented behavior that helped against memory bottlenecks, writing file systems for efficient model serving, and doing a ton of low-level grunt work in the times where everyone else just relied on torch. Being compute-constrained helped as well - necessity is the mother of invention.
But what is preventing their competitors, who have many more employees, who are also very talented, to do the same?
Every little improvement would save them billions, so it's hard to imagine they aren't pouring a lot of resources into that already.
If my grandmother had wheels...
What makes most hardware companies fail at software, for example? AI shops are usually run by ML people, succeeding at unrelated areas of expertise is hard for any organization.
But surely Google has both ML people and people expert at optimising stuff, be it hardware or software. In my opinion they have the talent, the sheer number of employees and the capital. Can deepseek really have people much more talented at optimizing stuff?
No I don't think they can, but then Google literally has their own custom inference hardware that they target so ... yeah 3.5 flash is extremely pricey compared to v4 pro and now I'm wondering why that would be. It's difficult to imagine they don't care given we know they're prepared to pay $2B / mo for additional GPU capacity.
The answer is a lean team that is also resource constrained. This not only fosters creativity, but also reduces bloat. People heavily underestimate how much inefficiencies(bloat) heavy bureaucracy adds.
To us, outside of the US, it was pretty obvious from day 1 of US chip-related sanctions on China that it will actually end up benefitting them more than punishing them.
Just wait till they flood the market with dirt-cheap GPU chips. And these are coming.. pretty soon.
That is a very good question. It is open source / open weight - yet none of the third party providers, that also host Deepsek, seem to be able to match Deepseek itself on price.
My guess is that they do aggressive caching / some proprietary optimizations in their hosting setup that they haven't published. Maybe also running at loss to gain market share.
And judging from latency / network performance, I don't think what you access, when you access deepseek.com from Europe, is hosted in China.
It's clear to me they are subsidizing inference in exchange for market share, and doing it at this scale makes the most sense if their target is getting more user data. Note that this sort of pricing isn't far off from the equivalent token-based pricing of ChatGPT or Claude subscription plans, which are more clearly subsidized by the user's data.