> Distillation is NOT an attack.
From the article -
> 28.8 million exchanges with Claude through almost 25,000 fraudulent accounts
wouldn't that be considered an attack? Not sure what I'm missing here.
> Distillation is NOT an attack.
From the article -
> 28.8 million exchanges with Claude through almost 25,000 fraudulent accounts
wouldn't that be considered an attack? Not sure what I'm missing here.
An attack against what? The sanctity of "their IP" that is itself the result of a massive copyright violation campaign?
Has it been proved in a court of law that it is a copyright violation?
In some cases if the model regurgitates the original material then that is clearly copyright violation, but if the model "learns" from the source material just like a human brain would then that's not a copyright violation.
No, what was proved in court was that they downloaded and trained on millions of pirated books. The court said their use of books is fair use, but stealing them isn't.
I think we're going to see cases that find distillation is also fair use. You're using the competing model like a book. You pay for it, you use it (read it), it informs your model, but you aren't repeating/reselling what the model told you verbatim. Foreign labs may still run afoul of competing labs' Terms of Service, and they may also pay a settlement (or not, it's a different jurisdiction after all), but the damage is already done. Distillation will become uncontroversial when done legally.
Are LLMs even copyrightable? If not, no need to speculate fair use.
Then distillation isn't a violation either by extension.
I would agree, if they are inspecting static output of American AI models without using their compute resources.
Scraping the internet for training is also using compute resources.
Aren't they buying the use of these resources just like any other customer?
it's a 'too big to fail' model. Because they have a big swinging dick all the copyright and other restrictions they violated would nuke them from orbit so we can't actually hold them to account for it .... for some fucking reason.
> Has it been proved in a court of law that it is a copyright violation?
God I'm so tired of this.
The billion dollar companies have the ability to hire an army of lawyers to DDOS the legal system. They at most pay a slap-on-the-wrist fine as the cost of doing business.
Ddos is a great framing of this :)
I'm extremely pro free markets etc, but the uncomfortable truth is anthropic stole the work of thousands of authors for profit. I think it will end one my favourite things in life: programming books.
If you have ever made a painting and sold it, then you too profited from the work of thousands of artists. How so? Because your sense of what is art came from those who preceded you. You have seen the works of Picasso, Rembrandt, Monet and so on and your brain absorbed from their work, just like an LLM.
If an LLM generalizes from thousands of authors then it is no different from what your brain does.
even if you disregard training costs, pure inference costs are a problem same reason other api have rate limit. this is an attack to bypass the rate limit.
Be careful to properly identify the bad behavior. A customer who buys a product for less money than it cost to produce has not necessarily done anything wrong. They just took advantage of a loss leader. That's on the seller.
Did you notice that when Valve was displeased about scalpers, Valve changed Valve's behavior?
It doesn't seem reasonable to complain that a customer of your AI service received that service for less money than it cost you to provide that service. I don't think that is the complaint here at all. If that was the issue, they could just raise their price.
As most everybody seems to notice, this is just a reenactment of what was once written for comedic effect: "You're trying to kidnap what I have rightfully stolen!"
Perhaps an arrangement can be reached.
https://clip.cafe/the-princess-bride-1987/youre-trying-kidna...
Still calling it an "attack" feels like a stretch.
They literally had to pay for that "attack", no matter how many accounts they used.
Google was killing many websites for decades with their crawlers. Most large websites decided to create dedicated infrastructure for their traffic alone. Somehow they didn't participate in that cost and were not called the attackers.
> and were not called the attackers.
This is the mental mental leaps I'm struggling with here. Did you not live through that era where they were explicitly and repeatedly called out as 'attacks'? They were generally tolerated/hardenee around as they provided value-in-discoverability.
Ding. Ding. Ding. "Provided value to the content author". AI scrapping negatively impacts the content author with zero compensation. There is no mutual benefit.
Just to ensure you don't gaslight yourself - I did live through that era and I worked on and supported a niche community (a MUD) where we did a lot of work encouraging marketing and discoverability through MUD forums as well as making sure our page was accurately and minimally keyword tagged and highly available for indexers.
In the time since that era search engines have transformed into platforms themselves that do engage in more parasitic behavior but it's important not to assume that the way it is now is how it always was - that's a rather defeatist path to walk down where you ignore awareness of the fact that there can be a highly profitable non-enshittified search engine that supports, rather than destroys, the ecosystem it benefits from.
It was better and, if we're diligent, it can be better again.
> [Google] were not called the attackers.
They should be. But as the saying goes, one website/company dying is a "tragedy," lots of them dying at the hands of one company is a statistic of corporate growth. Or something like that.
And then of course when the tables turn on a company and they're the ones getting bombarded, they cry foul. Keep in mind Anthropic did many similar things that you mentioned Google did.
I think the term "attack" here is appropriate but not in the way Anthropic is framing it. Alibaba is clearly violating terms to extract data, so that's definitely not above board. But it's not like a DDOS attack where Alibaba is trying to attack Anthropics servers. Alibaba is simply doing exactly what Anthropic did to the rest of the internet, just targeting Anthropic and paying them to do so.
It's merely a ToS violation.
My terms of service are that you are not allowed to breath oxygen.
I am getting a bit tired of companies being able to have user hostile, anticompetitive, monopolistic terms of service. The freedom we give them comes at the cost of the freedom as consumers to have free markets because they lock them up
Exactly, calling it “illicit” is funny. Your ToS isn’t law.
Illicit means maybe against the law but definitely against the rules, for example an illicit affair. The word for against the law is illegal, from Latin, or unlawful, from Germanic. I guess the Germanic cousin of "illicit" would be "forbidden."
Extramarital affairs are against the law in many countries and 17 US states. “Illicit affair” is potentially a holdover from when it was illegal more places, not just a conflating of against the rules with illegality.
https://en.wikipedia.org/wiki/Adultery_laws
That's violating TOS, spamming, possibly a DDOS, but the distillation in and of itself is not an attack it's just using the model.
Like the difference between scraping a site with one or two active connections vs thousands. It's not the scraping that is an attack, it is how they are going about it
> That's violating TOS, spamming, possibly a DDOS
As in distributed distillation of service?
Just sending a request to a service does not constitute an "attack". It seems that what Anthropic mean by "fraudulent account" is probably just one violating their terms of service - misuse of a subscription account, and/or the presumed nature of what the user was trying to do.
I guess Anthropoic would regard any developer using their subscription plan with OpenCode to be operating a "fraudulent account", maybe an "attacker" too. Now we know how they think of anyone using Claude to develop software competing with Anthropic. Only an "attacker" would want to vibe code their own harness, or god forbid want to learn how to build/train an LLM.
Of course Anthropic's wording is intended to be deliberately provocative, since they are trying to manipulate the US government into shutting down the Chinese competition.
Attack or customer
Is an attempt to copy all or parts of a model an attack, when models have very questionable copyright status? Maybe? I don't think most people have much sympathy here though.
Let’s not forget that by the same logic, Anthropic et al are “attacking” copyright holders all around the world by scraping their data unauthorized for training.
Pot calling kettle black.
Not only that, daily flooding websites with almost infinite amounts of request for ”web searches”. DDoS-by-VC money.
i mean, i got 5 replies in a minute of asking, and none deny it's an "attack", they simply say "good". HN should be better discourse.
https://en.wikipedia.org/wiki/Ken_McElroy