While the cost are lower than frontier models there are two factors that make DS4 Pro and K2.6 not as cheap as they might look.
For DS4 Pro there's a discount going on for the official API, which sometimes gets overlooked and mixed up in discussions. Simon uses the full price in the comparison, so that's not an issue here.
The other issue is that DS4 Pro and K2.6 often use way more reasoning tokens than the frontier models. In my testing there are certain pathological cases where a request can cost the same as with a frontier model because they use so much more tokens. To be fair I'm using DS and kimi via 3rd party providers, so they might have issues with their setups.
But if you look at the Artificial Analysis pages of the models you'll see that DSv4 Pro uses 190M tokens and K2.6 170M tokens for their intelligence benchmark, while GPT 5.5 (high) only used 45M.[0][1][2]
I recommend looking at the "Intelligence vs. Cost to Run Artificial Analysis Intelligence Index" ("Intelligence vs Cost" in the UI). The open source models are still cheaper to run, but not by as much as you'd think just looking at the token prices.
[0] https://artificialanalysis.ai/models/deepseek-v4-pro [1] https://artificialanalysis.ai/models/kimi-k2-6 [2] https://artificialanalysis.ai/models/gpt-5-5-high
This is very false DS4 is super cheap. I would advise to begin by reading their release paper. https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
They introduce very novel methods to improve long context efficiency and attention. HCA & mCH. It requires only 27% of flops for inference and 10% for KV cache than v3.2. This makes it super efficient. Think of this. For flops, we can now serve more than 3x the amount with the same number of compute, and you would need 30% of prior KV cache.
Furthermore, this release is a PREVIEW, DeepSeek is the real open labs and they not only cook up quite a bit with every single release, but they publish and share it. I'm running this locally.
Let me tell you how "CHEAP" this is. With v3.2 I would run out of GPU ram, spill into system ram with 256k context. It ran quite alright and I was happy with my 7tk/sec. With this, I'm 100% in GPU ram with full 1million token, run more than 2x fast while getting better results.
This is super cheap. moonshot has made it clear that they are starved for GPUs and that's why. If they had GPU capacity like we do in US and subsidized the models like we do here, they would be giving it away for free!
> I'm running this locally.
Impressive! What is your setup? Are you running the full DeepSeek V4 Pro, or V4 Flash?
Sure that can happen but it hasn’t been my experience. I just spent a whole day using it for some pretty hefty refactors, many rounds of back-and-forths, thousands of lines of code changes, reviews, investigations, many subagents running parallel tasks, the works. Total cost $0.95, altogether.
I had attempted this with Opus 4.6 in the past and it burned through the $10 budget I’d given it before it returned from my initial prompt.
Even if it’s heavily discounted, it would still have cost me single digits for a complete solution vs double-digits for exactly nothing.
Sounds promising, thanks for your report.
I didn't want to say that they're not cheaper to run, artificial analysis also shows that they're cheaper. My main point was about it being important to also look at token efficiency, not only cost per token, to get the full picture.