Hypocrisy is a form of corruption.

Anthropic's IP was created by harvesting and "distilling" other people's IP. Copyrighted materials, and the commons... which they have essentially privatized.

The commercial goal is to avoid competition. One of the main worries for AI is "commoditization" which has come to mean "not a monopoly." To that end, it doesn't matter is the competitor is Chinese American or other.

Their motivation here is clearly protectionism. The argument they make to politicians is national security. The legal argument is IP-theft, violation of service agreements or whatnot.

This is all very dangerous. Commercial interests repackaged as national security can lead to armed conflict.

> Anthropic's IP was created by harvesting and "distilling" other people's IP. Copyrighted materials, and the commons... which they have essentially privatized.

Anthropic and others argue that because LLMs don’t output full copyrighted works word for word - hence their LLMs aren’t infringing on copyright laws.

I think (if this ever comes to that) Chinese lab should use same arguments against Anthropic.

UPDATE: this is slight hyperbole of course, not worth arguing what they actually said. The point is intent and the facts - "The Big LLMs" "distilled" collective knowledge including copyrighted works at unimaginable scale, but it's all kosher and totally not piracy/copyright infringement. Though if you're teenager torrenting an mp3 - you'll get screwed.

> LLMs don’t output full copyrighted works word for word

Apparently they do, as per the evidence in the NYT vs OpenAI suit.

Isn’t the output of LLMs completely copyright-free in the US?

One lower court has said that the output of AI models is uncopyrightable.

But the real unsettled issue is if model training is fair use, and where copyright infringement might creep in to model output.

The copyright office itself also says this when it talks about determining authorship.

> Anthropic and others argue that because LLMs don’t output full copyrighted works word for word - hence their LLMs aren’t infringing on copyright laws.

That surely can't be what they argue, because I'm sure I can't translate a copyrighted book into a different language and say "that's fine, it's not word-for-word".

Bad China is stealing our stolen IP!

And putting it into free models like quen. It’s hard to care about this.

"Copyright violation of a published work" and "stealing private trade secrets" are in fact very different crimes.

Humans have spent millenia harvesting and distilling each other's IP - "the shoulder of giants" and all that, so it's an especially disingenuous take.

For something to be a trade secret, you have to actually keep it secret. If I get the ingredients of Coca-cola from an ex-employee, I've stolen a trade secret. If I work it out by doing a chemical analysis, I've stolen nothing.

There is a difference with anthropic, as no-one signs a licence agreement to buy a coke. But Anthropic are also not saying you can't publish the output of their models. It's not clear to me if trade secret law will (or should) cover a secret which can be extracted from information that licensees are not restricted from publishing.

Wait, really? So why doesn't someone just reverse-engineer Coca-Cola like that? My understanding was that a "clean room" implementation is fine, but not reverse-engineering. If you can just copy everything on the market, why isn't someone already doing that?

In the case of coca cola, because use of coca leaves is highly regulated due to the fact that they also contain cocaine. There is a YouTuber who claims to have reverse engineered Coca-Cola, but he had to use tea-tree oil instead of actual coca leaf extract.

Here's EFF on reverse engineering and the law: https://www.eff.org/issues/coders/reverse-engineering-faq

Historically a lot of competition in physical products was very much reverse engineering. Because you can buy them without signing your rights away. That's why companies are keen on patents and click-through agreements.

If you look at how "clean room" processes work, they are actually a form of reverse engineering. Also clean room technique exists to avoid your new implementation infringing copyright, not trade secrets.

The Coca-Cola formula was reverse engineered in 2026 by a sufficiently motivated individual.

Here it is.

Per liter of cola:

104 g sugar

1 mL Flavor Solution A

10 mL Flavor Solution B

Carbonated water to volume

Flavor Solution A (Essential Oils):

Dilute 20–21 mL of the following oil mixture to 1 L using 95% ethanol:

45.8 mL lemon oil

36.5 mL lime oil

8 mL tea tree oil (emulates decocainized coca leaf extract)

4.5 mL Cassia cinnamon oil

2.7 mL nutmeg oil

1.2 mL orange oil

0.7 mL coriander oil

0.6 mL fenchol

Flavor Solution B (Chemical and Color Base):

Dilute the following ingredients to a volume of 1 L using water:

320 mL Shank's caramel color or 190 mL Durkee caramel color

160 g glycerin

45 mL 85% phosphoric acid

10 mL vinegar (5% acidity)

10 mL vanilla extract

10 g wine tannins (emulates decocainized coca leaf extract)

9.65 g caffeine

As with designer clothing, the moat is the brand name.

We've had perfectly good copies of Coca-Cola for decades.

> We've had perfectly good copies of Coca-Cola for decades.

Not exactly. I mean for many people it was acceptible, but before that guy on youtube nobody bothered to do this deep chemical analysis.

Also even he struggled to replace coca leaf extract because there only single manufacturer in US with only single customer.

First statement is true.

Second statement is false.

Because having the nominal rights and having the economical means, societal incentives and actual desire to do so can be highly disjoint sets?

Plus Coca-Cola itself don’t even use the same formula through time and space IIRC. Which clearly show that what people will buy when they reach for Coca-Cola is not even the exact actual taste. You can’t replicate the whole customer experience that a given company provide at some point by only cloning the top of the iceberg they showcase as the product.

> Humans have spent millenia harvesting and distilling each other's IP

You maybe somewhat correct, but also copyright lawyers wouldn’t have work if it would be up for grabs to take others IP willy nilly just because “shoulders of giants and all that”.

I mean, there's an obvious difference between "distributing copies" (which is what the law was designed to prevent) and "training an LLM". We already managed "banning LLM output that contains copyrighted text" - it's much easier to just pirate a copy of the text. So I think the copyright lawyers will continue to have work as long as human written texts are worth buying.

> I mean, there's an obvious difference between "distributing copies" (which is what the law was designed to prevent) and "training an LLM".

What's the difference between me/you downloading an mp3 through torrents for personal use (not distributing) while risking criminal punishment in most of the western world and BigCorp downloading petabytes worth of copyrighted works "to train an LLM" and resell it?

Can me/you do the same, when police comes to mine/your door?

"Dear police, don't lock me up - I was just going to train an LLM!"

Well, uh, the BigCorps already went to court and paid that cost and aren't doing it anymore? Whereas you and I are apparently still pirating MP3s and probably haven't ever been to court?