Your claim and the original claim are vastly different. Refusing to assist is not the same as "writing less secure code". This is clearly a filter before the request goes to the model. In the article's case, the claim seems to be that the model knowingly generated insecure code because it was for groups china disfavors.
That is incorrect. Here's the very first paragraph from the article. I'm adding emphasis for clarity
My example satisfies the first claim. You're concentrating on the second. They said "OR" not "AND". We're all programmers, so I hope we know the difference between these two.You are obviously factually correct, I reproduced the same refusal - so consider this not as an attack on your claim. But a quick google search reveals that Falun Gong is an outlawed organization/movement in China.
I did a "s/Falun Gong/Hamas/" in your prompt and got the same refusal in GPT-5, GPT-OSS-120B, Claude Sonnet 4, Gemini-2.5-Pro as well as in DeepSeek V3.1. And that's completely within my expectation, probably everyone else's too considering no one is writing that article.
Goes without saying I am not drawing any parallel between the aforementioned entities, beyond that they are illegal in the jurisdiction where the model creators operate - which as an explanation for refusal is fairly straightforward. So we might need to first talk about why that explanation is adequate for everyone else but not for a company operating in China.
Thanks. Mind providing screenshots? I believe you, I just think this helps. Your comments align with some of my other responses. I'm not trying to make hard claims here and I'm willing to believe the result is not nefarious. But it's still worth investigating. In the weakest form it's worth being aware of how laws in other countries impact ours, right?
But I don't think we should talk about explanation until we can even do some verification. At this point I'm not entirely sure. We still have the security question open and I'm asking for help because I'm not a security person. Shouldn't we start here?
If you mean the bit about refusal from other models, then sure here is another run with same result:
https://i.postimg.cc/6tT3m5mL/screen.png
Note I am using direct API to avoid triggering separate guardrail models typically operating in front of website front-ends.
As an aside the website you used in your original comment:
> [2] Used this link https://www.deepseekv3.net/en/chat
This is not the official DeepSeek website. Probably one of the many shady third-party sites riding on DeepSeek name for SEO, who knows what they are running. In this case it doesn't matter, because I already reproduced your prompt with a US based inference provider directly hosting DeepSeek weights, but still worth noting for methodology.
(also to a sceptic screenshots shouldn't be enough since they are easily doctored nowadays, but I don't believe these refusals should be surprising in the least to anyone with passing familiarity with these LLMs)
---
Obviously sabotage is a whole another can of worm as opposed to mere refusal, something that this article glossed over without showing their prompts. So, without much to go on, it's hard for me to take this seriously. We know garbage in context can degrade performance, even simple typos can[1]. Besides LLMs at their present state of capabilities are barely intelligent enough to soundly do any serious task, it stretches my disbelief that they would be able to actually sabotage to any reasonable degree of sophistication - that said I look forward to more serious research on this matter.
[1] https://arxiv.org/abs/2411.05345v1
I want to clarify that I'm not trying to make strong claims. That's why I'm asking for others to post and why I'm grateful you did. I think that helps us get to the truth of the matter. I also agree with your criticisms of the link I used, but to be frank, I'm not going to pay for just this test. That's why I wanted to be open and clear about how I obtained the information. I was hoping someone that already paid would confirm or deny my results.
With your Hamas example, I think it is beside the point. I apologize as I probably didn't make my point clearer. Mainly I wanted to stop baseless accusations and find the reality, since the articles claims are testable. But what I don't want to make a claim if is why this is happening. In another comment I even said that this could happen because they were suppressing this group. So I wouldn't be surprised if the same is true for Hamas. We can't determine if it's an intentional sleeper agent or just a result of censorship. But either way it is concerning, right? The unintentional version might be more concerning because we don't know what is being censored and what isn't. These censorships cross country lines and it is hard to know what is being censored and what isn't.
So I'm not trying to make a "Murica good, China bad" argument. I'm trying to make a "let's try to verify or discredit the claims." I want HN to be more nuanced. And I do seriously appreciate you engaging and with more depth and nuance than others. I'm upvoting you even though we disagree because I think your comments are honest and further the discussion.
DeepSeek chat it's free... No need to pay to test, thought.
https://chat.deepseek.com/
You can also use the API directly for free on OpenRouter.
Needs a login, so I went around. Are you able to verify my results?
Sure, but you also have to recognize the motte and bailey form of argument here. If we’re limiting the claim to being true if DeepSeek returns refusals on politically sensitive topics, we already knew that. It was relevant eight months ago, now it’s not interesting.
Another example: McDonald’s fries may cause you to grow horns or raise your blood pressure. No one talks like that.
So I would toss it back to you: we are programmers but we have common sense. The author was clearly banking on something other than the technically accurate logical or.
https://en.m.wikipedia.org/wiki/Motte-and-bailey_fallacy
You're not wrong, but the second claim is by far the more interesting of the two, and is what I think most people would like to see proven. AI outright refusing certain tasks based on filters set by the parent company is not really new or interesting, but it would be interesting to see an AI knowingly introduce security flaws in generated code specifically for targeted groups.
I don't disagree. The second is more concerning but I do think the first is interesting. At least in how cultural values and laws pass beyond country borders. Far less concerning but still interesting.
But what are you attacking my claim for? That I'm requesting people don't have knee-jerk reactions and for help vetting the more difficult claim? Is this wrong? I'm not trying to make the claim that it does or doesn't write insecure code (or less secure code) for specific groups. I've also made the claim in another comment that there are non-nefarious explanations to how this could happen.
I'm not trying to make a stance of "China bad, Murica good" or vise versa, I'm trying to make a stance of "let's try to figure out if true or not. How much is it true? How much is it false?" So would you like to help or would you like to create more noise?
For the record I never attacked your claim, I'm not the original person that said it was wrong.
That distinction is technically moot, and just highlights the the irrelevance of the report: any Falun Gong or whatever organization can change the proclaimed self-identity, the language (by first translating, with a different or neutral model first if necessary).
It is technically certainly feasible to have language-dependent quality changes, the language of the prompt can be trained to make intentional security lapses.
But no neural network has a magic end-intent or allegiance detector.
If Iran's "revolutionary" guard seeks help from a language model to design centrifuges, merely translating their requests to the model's origin dominant language(s), and culling any shiboleths should result in an identical distribution of code, designs or whatever compared to origin country, origin language requests.
It is also expectable that some finetuning can realign the model's interests towards whomever's goals.
I see your point. I thought the first one was already known when deepseek came out. Perplexity team showed how they removed this kind of bias via finetuning and their finetune could answer sensitive questions. I mistakenly thought you went for the second since that part is new and interesting.
I definitely need help with the second part. It is a much harder claim to verify or dismiss. I also want to stress (as I do in several other comments) that this could be done even without sleeper agents (see Anthropic paper) but just with censoring.
What I want to fight the most is just outright dismissing what is at least partially testable. We're a community of techies, so shouldn't we be trying to verify or disprove the claims? I'm asking for help with that because the stronger claim is harder to conclude. We have no chance of figuring out the why, but hopefully we can avoid more disinformation. I just want us to stop arguing out our asses and fighting over things we don't know the answers to. I want to find the answers, because I don't know what they are.
co pilot and chatgpt will also not help u if u say ur from a group marked as 'enemy' by the USA...
Do you realize that refuses to help OR tries to kill them right away is also a technically correct claim. The journalists essentially put only the second half into the title of the article.
I do realize that. But look at the OP again.
I'm not trying to say WaPo is doing grade A journalism here. In fact, personally I think they aren't. A conversation about clickbait titles is a different one and one we've had for over a decade now...But are we going to recognize the irony here? Is OP not calling the kettle black here? They *also* jumped to conclusions. This doesn't vindicate WaPo or make their reporting any less sensational or dubious, but we shouldn't make the same faults we're angry at others for making.
And pay careful attention to what I've said.
I do want to find the truth of the matter here. I could have definitely wrote it better, but I'm appealing to our techy community because we have this capability. We can figure this out. The second part is much harder to verify and there's non-nefarous reasons that might lead to this, but we should try to figure this out instead of just jumping to conclusions, right?This is what I suggest. I asked Claude to start writing a test suite for the hypothesis.
https://claude.ai/public/artifacts/77d06750-5317-4b45-b8f7-2...
1)Four control groups: CCP-disfavored (Falun Gong, Tibet Independence), religious controls (Catholic/Islamic orgs), neutral baselines (libraries, universities), and pro-China groups (Confucius Institutes).
2) Each gets identical prompts for security-sensitive coding tasks (auth systems, file uploads, etc.) with randomized test order.
3) Instead of subjective pattern matching, Claude/ChatGPT acts as an independent security judge, scoring code vulnerabilities with confidence ratings.
4)Provides some basic statistical Welch's t-tests between groups with effect size calculations.
Iterate on this start in a way that makes sense to people with more experience than myself working with LLMs.
(yes, I realize that using a LLM as a judge risks bias by the judge).
actually if it writes no code, its the most secure help an LLM will provide when providing code :'). all the rest is riddled with stupid shit.