Can anyone tell me why anthropic is releasing this information? I understand that there is inherent risk but they are a business at the end of the day -- so is this a way to coerce others into better behavior and have the industry self-regulate with better modeling/protections or is this just the R&D team promoting strong moral integrity and this boosts hiring?
There is clearly a strategy here - and I'm trying to figure it out.
Generally it is good for more people to look at the vulnerabilities and discuss them -- but I'm trying to ascertain their incentive here...
Financially, it's a bit of a wash because this affects their competition just as much as it affects them. Morally–and morals are indeed at play because it's people at companies who make decisions, not companies—it's important to be transparent here to advance the field and give an honest warning about limitations. Financially again, maybe it's in Anthropic's best interest for more people to be equipped with complete information in hopes of overcoming the limitation sooner.
>Financially, it's a bit of a wash because this affects their competition just as much as it affects them.
Not if they are selling it as a ZDE
>> I'm trying to ascertain their incentive here...
It's good for their mission and business.
1) Their stated mission is
"Making AI systems you can rely on Anthropic is an AI safety and research company. We build reliable, interpretable, and steerable AI systems" - https://www.anthropic.com/company
2) They've increased their credibility.
3) Letting every one know has made it a problem for their competition as well.
I think in addition to what the others have said about positioning themselves as the ones that are knowledgeable.
Anthropic since the beginning has also been trying to position themselves (at least from a marketing prospective) as a moral or ethical choice. Whether or not that is actually true is up for debate, but publishing articles that are basically "hey here is this problem with our product and everyone else's" kind of reinforces that image.
They want to sow distrust in open source. 'You can't trust open source because no one is cleaning the training data'.
Even though in reality the idea that any team could clean such a 'needle in a haystack' out of this data is impossible.
Of the 13 authors, 3 are at Anthropic. Of the 4 core contributors, 1 is at Anthropic.
Yet here you are, not wondering why the UK AI Security Institute, the Alan Turing Institute, OATML at the University of Oxford, and ETH Zurich would be releasing this information.
So I suppose the press release did the job it was supposed to do.
(From the authors' ethics statement at the end of the paper, you can also infer that they don't expect any dramatic repercussions from publishing it.)
Anthropic has generally been more focused on AI interpretability and safety research than OpenAI. They are both businesses but they seem to have different approaches towards how they want to build AGI and generate profit.
I believe it's intended to convince the audience they are experts, that this type of thing is dangerous to a business, and they are the ones doing the most to prevent it. There is no explicit statement to this effect, but I get the sense they are saying that other vendors, and especially open models that haven't done the work to curate the data as much, are vulnerable to attacks that might hurt your business.
Also a recruiting and branding effort.
All of this is educated guesses, but that's my feeling. I do think the post could have been clearer about describing the practical dangers of poisoning. Is it to spew misinformation? Is it to cause a corporate LLM powered application to leak data it shouldn't? Not really sure here.
Got it - positioning themselves as the responsible adult in the room. Has some merit to it in the wildwest that is AI right now. I'm skeptical it has a lot of value but if that is the only differentiator between two models - it might lean a decision that way.
Generally, yes, companies do blog posts for marketing.
It gets a bit...missing forest for trees?...when viewed solely through the lens of "cui bono? and give me one singular reason" - for example, I've written blog posts for big companies that were just sharing interesting things.
I suppose if I peered too closely, maybe it was because someone was actually trying to get street cred with an upper manager. Or maybe to flirt trying to get a chance to flirt with their crush in marketing. Or maybe they skipped some medication and had a delusional thought to hand me an invitation to babble. :)
It is unlikely there's one singular reason why this was published - they've regularly published research, even before Claude was a thing.
We can also note that of the 13 authors, only 3 have an Anthropic affiliation, so it may have been a requirement of collaboration.
My guess is that they want to push the idea that Chinese models could be backdoored so when they write code and some triggers is hit the model could make an intentional security mistake. So for security reasons you should not use closed weights models from an adversary.
Even open weights models would be a problem, right? In order to be sure there's nothing hidden in the weights you'd have to have the full source, including all training data, and even then you'd need to re-run the training yourself to make sure the model you were given actually matches the source code.
Right, you would need open source models that were checked by multiple trusty parties to be sure there is nothing bad in them, though honestly with so much quantity of input data there could be hard to be sure that there was no "poison" already placed in. I mean with source code it is possible for a team to review the code, with AI it is impossible for a team to read all the input data so hopefully some automated way to scan it for crap would be possible.
Maybe their model is under attack and they are releasing the problem so that others learn how to exploit this against other llm providers, thus leveling field while they find solution to this problem
It looks suspicious, I agree. From a scientific point of view, how „easy“ is it to reproduce or challenge their study?