Deepseek v4 Pro feels like Claude Opus 4.6 in it's personality but here's what I did find out about costs:
I did cut loose Deepseek v4 on a decent sized Typescript codebase and asked it to only focus on a single endpoint and go in depth on it layer by layer (API, DTOs, service, database models) and form a complete picture of types involved and introduced and ensure no adhoc types are being introduced.
It developed a very brief but very to the point summary of types being introduced and which of them were refunded etc.
Then I asked it to simplify it all.
It obviously went through lots of files in both prompts but total cost? Just $0.09 for the Pro version.
On Claude Opus I think (from past experience before price hikes) these two prompts alone would have burned somewhere between $9 to $13 easily with not much benefit.
Note - I didn't use Open router rather used the Deepseek API directly because Open router itself was being rate limited by Deep seek.
Only similarity it has to Opus 4.6 is the 4 in the name. I do not understand these dishonest comparisons. OOS models are vool, cheap and promising for a future -- but why are we pretending they are better than they are?
Speak for yourself. I found switching from Opus 4.7 to be completely painless and in fact, due to the reliability of Anthropic’s API, less of a friction despite slower response times. Zero issues on a large mono repro
What provider are you using? I have it a shot through open router and saw some weird half formed words coming through occasionally, would love to switch over and give it a proper go
While the cost are lower than frontier models there are two factors that make DS4 Pro and K2.6 not as cheap as they might look.
For DS4 Pro there's a discount going on for the official API, which sometimes gets overlooked and mixed up in discussions. Simon uses the full price in the comparison, so that's not an issue here.
The other issue is that DS4 Pro and K2.6 often use way more reasoning tokens than the frontier models. In my testing there are certain pathological cases where a request can cost the same as with a frontier model because they use so much more tokens.
To be fair I'm using DS and kimi via 3rd party providers, so they might have issues with their setups.
But if you look at the Artificial Analysis pages of the models you'll see that DSv4 Pro uses 190M tokens and K2.6 170M tokens for their intelligence benchmark, while GPT 5.5 (high) only used 45M.[0][1][2]
I recommend looking at the "Intelligence vs. Cost to Run Artificial Analysis Intelligence Index" ("Intelligence vs Cost" in the UI). The open source models are still cheaper to run, but not by as much as you'd think just looking at the token prices.
Sure that can happen but it hasn’t been my experience. I just spent a whole day using it for some pretty hefty refactors, many rounds of back-and-forths, thousands of lines of code changes, reviews, investigations, many subagents running parallel tasks, the works. Total cost $0.95, altogether.
I had attempted this with Opus 4.6 in the past and it burned through the $10 budget I’d given it before it returned from my initial prompt.
Even if it’s heavily discounted, it would still have cost me single digits for a complete solution vs double-digits for exactly nothing.
Yeah even the Chinese open models have a problem that inference costs for these aren't that cheap. The only way out for the AI bubble collapse is simply more efficient hardware at lower costs and infrastructure setup downtime.
I'm not sure I'd call it "almost on the frontier," but I do think that v4 Pro is the most usable coding model I've seen out of China. I've used it via Ollama Cloud (coding) and OpenRouter (data processing). Feels Sonnet-level to me -- solid at implementation when given a specification, but falls a good bit short of Opus 4.7 max thinking when planning out larger changes or when given open-ended prompts.
I've been using v4 pro for the past few days and honestly in terms of quality it seems more or less on par with open AIs 5.4 or opus 4.6 (i havent tried 4.7)
To be clear, i'm not doing state of the art stuff. I mostly used it for frontend development since i'm not great at that and just need a decent looking prototype.
But for my purposes it's a perfectly good model, and the price is decent.
I can't wait for open model small enough for me to run locally come out though. I hate having to rely on someone elses machines (and getting all my data exfiltrated that way)
You can use Tinfoil for inference, which lets you use the model in the cloud while getting similar privacy as running locally: https://tinfoil.sh/inference.
Disclaimer I'm the cofounder. This works by running the model inside a secure enclave (using NVIDIA confidential computing) and verifying the open source code running inside the enclave matches the runtime attestation. The docs walk you through the verification process: https://docs.tinfoil.sh/verification/verification-in-tinfoil
I'm currently paying for Anthropic's Max subscription (the 100 USD one) and I quite often hit or approach the 5 hour limits, but usually get to around 60-80% of the weekly limits before they reset (Opus 4.7 with high thinking for everything, unless CC decides to spawn sub-agents with Haiku or something).
Those tokens are heavily subsidized, but DeepSeek's API pricing is looking really good. For example, with an agentic coding setup (roughly 85% input, 15% output and around 90% cache reads) I'd get around 150M tokens per month for the same 100 USD. Even at more output tokens and worse cache performance, it'd still most likely be upwards of 100M.
What would be the non-subsidized price for a V4 api? Can it be priced 3x cheaper than bigger models? In Openrouter, this 1600B param model costs 0.4$. Whereas Kimi 2.6, 1000B params is 0.7; GLM 5.1, 754B params is 1.0$.
The 150M assumption of mine is for 100 USD at the regular prices (though even that needs sufficient cache hits). Anthropic subsidizes way more per-token I think, though.
The pelican is really getting old as an a standalone evaluation metric. By now they are certainly going to be in training set if not explicitly tuned to produce it for the press on HN alone.
Keep the pelican but isn’t it time to add something else more novel that all current and past models struggle with?
Has anybody used V4 hard, for the most challenging tasks (agentically, locally)? It's so hard to compare without putting serious time in it. Like spending a year daily with the model.
I tried it for two tasks using Claude Code, on max effort.
1. Web platform, asking it to analyse a feature to create reports, and coming up with better solution and better UX. it did great, I would say on par with Sonnet 4.6 or even opus considering the thinking and explanation
2. Mac app with some basic functionality, it did well from functional perspective but then I used Opus 4.7 to evaluate and suggest improvements, where I noticed it missed many vital points in design system and usability.
I think it’s a leap, I haven’t used a model this capable that is not OpenAI or Anthropic
I recently switched from Claude to Opencode Go + pi.dev. It has Deepseek v4 pro along with Kimi K2.6, and it's performing quite well for basic coding, without hitting any limits.
I tried deepseek v4 through open code at the weekend. I'm a daily Claude/Claude code user.
I tried to build something simple and while it got the job done the thinking displayed did not fill me with confidence. It was pages and pages of "actually no", "hang on", "wait that makes no sense". It was like the model was having a breakdown.
Bear in mind open code was also new to me so I could be just seeing thinking where I usually don't
I feel the reasoning might be tuned for hard questions and not agentic work. I feel it overthinks, good for a very hard question, not for small incremental agentic steps. In theory, disabling thinking and using really well formed instruction, forcing it to still emit a bunch of tokens each step prior to taking action, could help. Only one way to find out though.
The V3/R1 time and now are in such contrast. V3/R1 were hyped hard and barely usable for coding. V4 is much less hyped but (anecdotally) it has completely demolished all the Flash/Lite/Spark models.
Because V4 doesn't even beat Kimi K2.6 and GLM 5.1, which have been out longer. It's only talked about as much as it is because it's Deepseek and R1 was the first open source reasoning model. V4 isn't even multimodal (unlike Kimi) and the 1M context doesn't seem to perform particularly well.
Huh? R1 was one of the earliest openly available MoE and reasoning models, that's definitely not "hype". People tried to do reasoning before by asking the model to "think it through step by step" but that was a hack. The later V3.1 and V3.2 releases AIUI unified reasoning/non-reasoning use under a single model.
I'm surprised that people here don't care at all about these models openly training on your data, especially if you use them straight from the model developer. Whereas things like "GitHub now automatically opts everyone into using their code for model training" get hundreds of justifiably angry comments, I never see this brought up anymore on posts like these talking about using Chinese models through OpenRouter. This might be explained by "well they're different people", but the difference is very stark for that to be the whole explanation.
You definitely have a bone to pick. Chinese researchers usually have given the world the most cheap and consistent high quality research around LLMs. They don't pretend, they do the work and release the goodies. Mostly so cheap, every one in the world has a chance to use close to frontier models. Why would you respond with "Anger"?
You let us know what your real complaint is about and let's not feign indignation at open models and research.
I am personally okay helping them as long as they publish the models and dont keep them closed. And I dont trust the settings where providers say they wont train on it.
Because they give it away for free and offer APIs at very acceptable rates. Not that hard to figure out, Robin Hood stealing our data tax back comes to mind.
User publishes to github => Copilot trains with GitHub data => MS Sells copilot => User workes for Microsoft (in the sense of giving it's labour for MS to make money)
User publishes to github => Deepseek trains with GitHub data => Deepseek gives model away for free => User did not work for Deepseek (in the sense of giving it's labour for Deepseek to make money)
At least that’s what they’re telling you. It’s a ”trust me bro” scenario.
I’d rather use the phone home version (deepseeks own endpoint). The benefit is that I’m fairly certain that they actually host the model I’m paying for.
What do you mean specifically? Data passed through OpenRouter? Or that they too indiscriminately ingest data all over the web? If the former, I assume it's just that anyone still using them just doesn't care where the data comes from. If the latter, well, it seems like every day there's some news on some new model from somewhere, and it takes dedication to complain every time. There's also the factor that I believe DeepSeek is more open with the model, while others keep it entirely proprietary, which feels fairer and (personally) is also less offensive.
It's totally fair to use GPL code, it just means all the models built by Anthropic, OpenAI, etc. using GPL-licensed source are themselves bound by the GPL. Plus, any works created downstream using those AI tools.
We're on the verge of a golden age of software as soon as someone finds a court with courage.
I think AI will create an open source dark age. Gradually, we'll see a lot less new good open source code. A gradual shift back to the proprietary world. Simmilar to the 1950-1990 period.
Things being public should not be enough. just because someone leaked your medical information to the public via a data breach should not make it fair game. There should be some rules.
Do you really think OpenAI, Anthropic or any other entity in the same business respects your data?
The Chinese AI companies who release open weights actually deserve whatever input you give them. They are the reason why there is competition and not duopolies in the domain.
I think Google, and likely Anthropic, indeed do honor the settings chosen by the user. For Google in particular it'd be very surprising if they didn't. That's also why both do everything they can to trick users into allowing it.
OpenAI, I wouldn't be surprised if you were right.
AWS Bedrock has DeepSeek models running on their infrastructure. That should be enough to prevent training on user data (there's a markup compared to DeepSeek's pricing though).
And unfortunately AWS doesn't have prepaid billing, so you can't just give the internet access to your API key without getting FinDDoS'd.
My policy is that I don't allow agents to access all code. Some of it is shielded behind bind mounts. Maybe this is a pathetic, artisanal (or ego-driven), reaction of mine to the inevitable. I allow them to work on about 90% of the code (most codebases fully), with some code being considered too valuable to expose to the vendor. When data is involved, LLMs only get to see anonymized data.
This cute policy of mine won't affect anything though. The more we use the models, the more the models will replace this kind of work. Centralisation of power is inevitable; in Medival Europe, we used to have state & church ruling. In modern times but before the internet, it was probably state and banks. Maybe with ongoing digitization (bank offices disappearing) making banks less costly to operate; combined with with bank bailouts, maybe govenments will fully nationalize or at least banks will consolidate.
Then the AI companies will consolidate with the internet information and communication companies (Google/Meta for the US, and Alibaba/Tencent for China). Maybe we'll end up with a few de-facto governmental megacorps that rule in tandem and close cooperation with the formal government, who might handle mostly infra, utilities and the army. The megacorp would control narrative more and take more of a paternal role (educating and protecting the citizens, normally handled by formal governments).
Two factors. First is anti-americanism (or at least anti-american-capitalism).
But the more important one is the social contract. Github came far before LLM era. The branding around it is being the storage of open source projects and many users want to it stay away from AI hype. You won't expect LLM providers to stay away from AI hype (duh) so it's less an issue for them.
I tweeted about some implementation and review runs that used V4 Pro.
Even without the currently discounted pricing, the value is incredible.
It takes about twice as long to finish code reviews given an identical context compared to opus 4.7/gpt 5.5 but at 1/10 the cost of less, there's just no comparison.
DeepSeek V4 Pro has about 25GB worth of active parameters, so if you can fit the whole ~870GB weights + cache in RAM your tok/s is bounded above by 25GB divided into your system memory bandwidth in GB/s. If you can't fit your whole model in RAM you'll be bottlenecked to some degree by storage bandwidth which is in the single or low double digits in GB/s.
Mind you, it's an absolutely sensible setup either way if you're just testing a few queries and are willing to run them unattended/overnight. Especially since the KV-cache size is apparently really low (~10GB is said to be typical) so you get a lot of batching potential even in consumer setups, which amortizes the cost of fetching weights.
Deepseek v4 Pro feels like Claude Opus 4.6 in it's personality but here's what I did find out about costs:
I did cut loose Deepseek v4 on a decent sized Typescript codebase and asked it to only focus on a single endpoint and go in depth on it layer by layer (API, DTOs, service, database models) and form a complete picture of types involved and introduced and ensure no adhoc types are being introduced.
It developed a very brief but very to the point summary of types being introduced and which of them were refunded etc.
Then I asked it to simplify it all.
It obviously went through lots of files in both prompts but total cost? Just $0.09 for the Pro version.
On Claude Opus I think (from past experience before price hikes) these two prompts alone would have burned somewhere between $9 to $13 easily with not much benefit.
Note - I didn't use Open router rather used the Deepseek API directly because Open router itself was being rate limited by Deep seek.
Even taking into account the fact that they are billing at 75% discount it's still quite cheaper
Aren't they all billing at discount?
Anthropic's and OpenAI's costs seem to include a fairly ok margin, from the very fourth hand info I have.
How did you use it? OpenRouter, or provider directly?
Only similarity it has to Opus 4.6 is the 4 in the name. I do not understand these dishonest comparisons. OOS models are vool, cheap and promising for a future -- but why are we pretending they are better than they are?
Speak for yourself. I found switching from Opus 4.7 to be completely painless and in fact, due to the reliability of Anthropic’s API, less of a friction despite slower response times. Zero issues on a large mono repro
What provider are you using? I have it a shot through open router and saw some weird half formed words coming through occasionally, would love to switch over and give it a proper go
While the cost are lower than frontier models there are two factors that make DS4 Pro and K2.6 not as cheap as they might look.
For DS4 Pro there's a discount going on for the official API, which sometimes gets overlooked and mixed up in discussions. Simon uses the full price in the comparison, so that's not an issue here.
The other issue is that DS4 Pro and K2.6 often use way more reasoning tokens than the frontier models. In my testing there are certain pathological cases where a request can cost the same as with a frontier model because they use so much more tokens. To be fair I'm using DS and kimi via 3rd party providers, so they might have issues with their setups.
But if you look at the Artificial Analysis pages of the models you'll see that DSv4 Pro uses 190M tokens and K2.6 170M tokens for their intelligence benchmark, while GPT 5.5 (high) only used 45M.[0][1][2]
I recommend looking at the "Intelligence vs. Cost to Run Artificial Analysis Intelligence Index" ("Intelligence vs Cost" in the UI). The open source models are still cheaper to run, but not by as much as you'd think just looking at the token prices.
[0] https://artificialanalysis.ai/models/deepseek-v4-pro [1] https://artificialanalysis.ai/models/kimi-k2-6 [2] https://artificialanalysis.ai/models/gpt-5-5-high
Sure that can happen but it hasn’t been my experience. I just spent a whole day using it for some pretty hefty refactors, many rounds of back-and-forths, thousands of lines of code changes, reviews, investigations, many subagents running parallel tasks, the works. Total cost $0.95, altogether.
I had attempted this with Opus 4.6 in the past and it burned through the $10 budget I’d given it before it returned from my initial prompt.
Even if it’s heavily discounted, it would still have cost me single digits for a complete solution vs double-digits for exactly nothing.
From the pricing page of deepseek:
(3) The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC.
Was this taken into account when reviewing the model?
obviously everyone subsidizes for user acquisition - after all people need to be coaxed to test your model, claude code subscriptions come to me one.
DeepSeek pro is 65/86% cheaper (i/o tokens) in subsidized pro vs pro and 91/97% cheaper with current subsidies.
Flash vs Sonnet 4.6 is 95/98%
Yeah even the Chinese open models have a problem that inference costs for these aren't that cheap. The only way out for the AI bubble collapse is simply more efficient hardware at lower costs and infrastructure setup downtime.
It’s just an introduction price to speed up adoption for the rest of the month, hardly worth mentioning compared to subsidized coding plans.
We know DS runs profitable, they also indicate in their paper they expect prices to drop as they get access to the next gen Huawei cards.
I'm not sure I'd call it "almost on the frontier," but I do think that v4 Pro is the most usable coding model I've seen out of China. I've used it via Ollama Cloud (coding) and OpenRouter (data processing). Feels Sonnet-level to me -- solid at implementation when given a specification, but falls a good bit short of Opus 4.7 max thinking when planning out larger changes or when given open-ended prompts.
Have you given GLM 5.1 or Kimi K2.6 a shot for coding? They outperform Deepseek v4 pro.
Keep in mind that DeepSeek has a max thinking mode of its own in the API.
I've been using v4 pro for the past few days and honestly in terms of quality it seems more or less on par with open AIs 5.4 or opus 4.6 (i havent tried 4.7)
To be clear, i'm not doing state of the art stuff. I mostly used it for frontend development since i'm not great at that and just need a decent looking prototype.
But for my purposes it's a perfectly good model, and the price is decent.
I can't wait for open model small enough for me to run locally come out though. I hate having to rely on someone elses machines (and getting all my data exfiltrated that way)
You can use Tinfoil for inference, which lets you use the model in the cloud while getting similar privacy as running locally: https://tinfoil.sh/inference.
Disclaimer I'm the cofounder. This works by running the model inside a secure enclave (using NVIDIA confidential computing) and verifying the open source code running inside the enclave matches the runtime attestation. The docs walk you through the verification process: https://docs.tinfoil.sh/verification/verification-in-tinfoil
Thanks for sharing your experience, I’m looking to try it out.
Which provider are you using for inference? Opencode or the DeepSeek api?
I'm currently paying for Anthropic's Max subscription (the 100 USD one) and I quite often hit or approach the 5 hour limits, but usually get to around 60-80% of the weekly limits before they reset (Opus 4.7 with high thinking for everything, unless CC decides to spawn sub-agents with Haiku or something).
Those tokens are heavily subsidized, but DeepSeek's API pricing is looking really good. For example, with an agentic coding setup (roughly 85% input, 15% output and around 90% cache reads) I'd get around 150M tokens per month for the same 100 USD. Even at more output tokens and worse cache performance, it'd still most likely be upwards of 100M.
What would be the non-subsidized price for a V4 api? Can it be priced 3x cheaper than bigger models? In Openrouter, this 1600B param model costs 0.4$. Whereas Kimi 2.6, 1000B params is 0.7; GLM 5.1, 754B params is 1.0$.
Here’s their pricing docs, they’re running a discount for now https://api-docs.deepseek.com/quick_start/pricing/
The 150M assumption of mine is for 100 USD at the regular prices (though even that needs sufficient cache hits). Anthropic subsidizes way more per-token I think, though.
Someone on Twitter got >200M tokens for around $10 at the current pricing level
So it begins.
The pelican is really getting old as an a standalone evaluation metric. By now they are certainly going to be in training set if not explicitly tuned to produce it for the press on HN alone.
Keep the pelican but isn’t it time to add something else more novel that all current and past models struggle with?
It also seems like all of the models have converged on very similar images.
Relevant: https://news.ycombinator.com/item?id=47839493
Has anybody used V4 hard, for the most challenging tasks (agentically, locally)? It's so hard to compare without putting serious time in it. Like spending a year daily with the model.
I tried it for two tasks using Claude Code, on max effort.
1. Web platform, asking it to analyse a feature to create reports, and coming up with better solution and better UX. it did great, I would say on par with Sonnet 4.6 or even opus considering the thinking and explanation
2. Mac app with some basic functionality, it did well from functional perspective but then I used Opus 4.7 to evaluate and suggest improvements, where I noticed it missed many vital points in design system and usability.
I think it’s a leap, I haven’t used a model this capable that is not OpenAI or Anthropic
I recently switched from Claude to Opencode Go + pi.dev. It has Deepseek v4 pro along with Kimi K2.6, and it's performing quite well for basic coding, without hitting any limits.
I tried deepseek v4 through open code at the weekend. I'm a daily Claude/Claude code user.
I tried to build something simple and while it got the job done the thinking displayed did not fill me with confidence. It was pages and pages of "actually no", "hang on", "wait that makes no sense". It was like the model was having a breakdown.
Bear in mind open code was also new to me so I could be just seeing thinking where I usually don't
I feel the reasoning might be tuned for hard questions and not agentic work. I feel it overthinks, good for a very hard question, not for small incremental agentic steps. In theory, disabling thinking and using really well formed instruction, forcing it to still emit a bunch of tokens each step prior to taking action, could help. Only one way to find out though.
Before CC and Codex removed thinking/verbose and hid most of it, both do that .
I see similar things using GLM 5.1 in pi.
I had to turn off thinking traces because it was just giving me anxiety looking at it.
> Bear in mind open code was also new to me so I could be just seeing thinking where I usually don't
Well there's your problem.
Edit: I remember seeing similar things with ChatGPT or Codex, although I can't remember in which context.
The V3/R1 time and now are in such contrast. V3/R1 were hyped hard and barely usable for coding. V4 is much less hyped but (anecdotally) it has completely demolished all the Flash/Lite/Spark models.
Because V4 doesn't even beat Kimi K2.6 and GLM 5.1, which have been out longer. It's only talked about as much as it is because it's Deepseek and R1 was the first open source reasoning model. V4 isn't even multimodal (unlike Kimi) and the 1M context doesn't seem to perform particularly well.
Huh? R1 was one of the earliest openly available MoE and reasoning models, that's definitely not "hype". People tried to do reasoning before by asking the model to "think it through step by step" but that was a hack. The later V3.1 and V3.2 releases AIUI unified reasoning/non-reasoning use under a single model.
I'm surprised that people here don't care at all about these models openly training on your data, especially if you use them straight from the model developer. Whereas things like "GitHub now automatically opts everyone into using their code for model training" get hundreds of justifiably angry comments, I never see this brought up anymore on posts like these talking about using Chinese models through OpenRouter. This might be explained by "well they're different people", but the difference is very stark for that to be the whole explanation.
You definitely have a bone to pick. Chinese researchers usually have given the world the most cheap and consistent high quality research around LLMs. They don't pretend, they do the work and release the goodies. Mostly so cheap, every one in the world has a chance to use close to frontier models. Why would you respond with "Anger"?
You let us know what your real complaint is about and let's not feign indignation at open models and research.
You're making completely unfounded assumptions about me. I use Chinese models myself.
I am personally okay helping them as long as they publish the models and dont keep them closed. And I dont trust the settings where providers say they wont train on it.
Because they give it away for free and offer APIs at very acceptable rates. Not that hard to figure out, Robin Hood stealing our data tax back comes to mind.
GitHub is free.
User publishes to github => Copilot trains with GitHub data => MS Sells copilot => User workes for Microsoft (in the sense of giving it's labour for MS to make money)
User publishes to github => Deepseek trains with GitHub data => Deepseek gives model away for free => User did not work for Deepseek (in the sense of giving it's labour for Deepseek to make money)
In the first case MS is giving part of Github itself away for free.
Exactly, it's intuitively different.
The cool thing about open-weights model is that you are free to use alternative providers that won't phone home to the original model creators.
I see 6 alternative providers listed on Openrouter for DeepSeek V4 Pro for example.
At least that’s what they’re telling you. It’s a ”trust me bro” scenario.
I’d rather use the phone home version (deepseeks own endpoint). The benefit is that I’m fairly certain that they actually host the model I’m paying for.
What do you mean specifically? Data passed through OpenRouter? Or that they too indiscriminately ingest data all over the web? If the former, I assume it's just that anyone still using them just doesn't care where the data comes from. If the latter, well, it seems like every day there's some news on some new model from somewhere, and it takes dedication to complain every time. There's also the factor that I believe DeepSeek is more open with the model, while others keep it entirely proprietary, which feels fairer and (personally) is also less offensive.
If the data is opensource on github, then in my opinion it should be fair game.
IMO this is unfair for GPL or similarly licensed code.
Seems ok for MIT like licensed code though
It's totally fair to use GPL code, it just means all the models built by Anthropic, OpenAI, etc. using GPL-licensed source are themselves bound by the GPL. Plus, any works created downstream using those AI tools.
We're on the verge of a golden age of software as soon as someone finds a court with courage.
Ah, you have much more faith in the legal system than I do. It's nice to dream, though.
I think AI will create an open source dark age. Gradually, we'll see a lot less new good open source code. A gradual shift back to the proprietary world. Simmilar to the 1950-1990 period.
Things being public should not be enough. just because someone leaked your medical information to the public via a data breach should not make it fair game. There should be some rules.
I feel that's a false dichotomy. The code on github is freely available for people to read and learn from, leaked medical data isn't.
I feel that's a flase dichotomy. The code visible on github is freely available for anyone to read and learn from.
As opposed to?
Do you really think OpenAI, Anthropic or any other entity in the same business respects your data?
The Chinese AI companies who release open weights actually deserve whatever input you give them. They are the reason why there is competition and not duopolies in the domain.
I think Google, and likely Anthropic, indeed do honor the settings chosen by the user. For Google in particular it'd be very surprising if they didn't. That's also why both do everything they can to trick users into allowing it.
OpenAI, I wouldn't be surprised if you were right.
AWS Bedrock has DeepSeek models running on their infrastructure. That should be enough to prevent training on user data (there's a markup compared to DeepSeek's pricing though).
And unfortunately AWS doesn't have prepaid billing, so you can't just give the internet access to your API key without getting FinDDoS'd.
If anyone is looking for a solution in this space. Fire me an email, I have a partner whose focussed closely on that problem set!
The latest one available for serverless inference looks to be from 8 months (Deepseek v3.1), which is an eternity and far behind.
My policy is that I don't allow agents to access all code. Some of it is shielded behind bind mounts. Maybe this is a pathetic, artisanal (or ego-driven), reaction of mine to the inevitable. I allow them to work on about 90% of the code (most codebases fully), with some code being considered too valuable to expose to the vendor. When data is involved, LLMs only get to see anonymized data.
This cute policy of mine won't affect anything though. The more we use the models, the more the models will replace this kind of work. Centralisation of power is inevitable; in Medival Europe, we used to have state & church ruling. In modern times but before the internet, it was probably state and banks. Maybe with ongoing digitization (bank offices disappearing) making banks less costly to operate; combined with with bank bailouts, maybe govenments will fully nationalize or at least banks will consolidate.
Then the AI companies will consolidate with the internet information and communication companies (Google/Meta for the US, and Alibaba/Tencent for China). Maybe we'll end up with a few de-facto governmental megacorps that rule in tandem and close cooperation with the formal government, who might handle mostly infra, utilities and the army. The megacorp would control narrative more and take more of a paternal role (educating and protecting the citizens, normally handled by formal governments).
Does this make sense?
Two factors. First is anti-americanism (or at least anti-american-capitalism).
But the more important one is the social contract. Github came far before LLM era. The branding around it is being the storage of open source projects and many users want to it stay away from AI hype. You won't expect LLM providers to stay away from AI hype (duh) so it's less an issue for them.
If they give me the resulting model in the end, they can train on my data all they want. Hell, I'll send them more of it.
I tweeted about some implementation and review runs that used V4 Pro.
Even without the currently discounted pricing, the value is incredible.
It takes about twice as long to finish code reviews given an identical context compared to opus 4.7/gpt 5.5 but at 1/10 the cost of less, there's just no comparison.
https://twitter.com/aljosa/status/2049176528638902555
Did you do this test through OpenRouter?
I doubt if those models already knew this pelican test...
If I want to run 'coding prompts' running the biggest deepseek model on CPU, what is the order of time I will have wait, hours, days?
DeepSeek V4 Pro has about 25GB worth of active parameters, so if you can fit the whole ~870GB weights + cache in RAM your tok/s is bounded above by 25GB divided into your system memory bandwidth in GB/s. If you can't fit your whole model in RAM you'll be bottlenecked to some degree by storage bandwidth which is in the single or low double digits in GB/s.
Mind you, it's an absolutely sensible setup either way if you're just testing a few queries and are willing to run them unattended/overnight. Especially since the KV-cache size is apparently really low (~10GB is said to be typical) so you get a lot of batching potential even in consumer setups, which amortizes the cost of fetching weights.
[dead]
[dead]
So I'm involved in an open source AI cli coding assistant called Cecli (cecli.dev) which is specifically designed to work well with DeepSeek.
DeepSeek is a great model, and Cecli is all about efficiency. It works great for my purposes - agentic programming on a budget.
https://www.reddit.com/r/Hugston/comments/1t1mk0j/comparison...