This is a bit ironic, Anthropic complaining about a competitor using claude data to build its own product when Anthropic basically used all of human knowledge production to build claude, i don't think they paid every magazine, author, journalist, etc ...

This is almost standard practice in any competitive industry anyways. Disassemble your competitor's product, study it and try to reproduce / improve.

Yea I’ll never have any sympathy for this claim given that Claude is built on theft

It's a claude eat claude world out there

Yeah, and I believe Anthropic would "distill back" without thinking twice, if the other model would be good enough.

Yup, it's hard to take seriously any complaint about "stealing" Anthropic's services, when their entire business is based on massive theft.

The US labs do seem to have announced a lot of licensing deals though, and are buying things today due to the previous lawsuits.

At what point will we be better to support a lab that pays (some) licenses today vs the ones that pay none?

Some of the deals are in the hundreds of millions, so I suspect licensing is over a billion today? (Pure guess). That might become a big disadvantage in a price (or content) war.

I haven't seen any money, have you? Until they pay everyone or release weights theres really no change. Also they're doing this after they've already stolen. Not negotiated before

My understanding is that US labs now are paying for books, news and other content from media companies, but people in the middle (like blog authors) are left out by current courts over whether fair use applies. There's definitely an argument over whether we should tighten this, but they do seem to be under increasing pressure to be legal now by our existing interpretation. Most cases are still ongoing.

One reason people love the Chinese video models is that they seem to be trained on every hollywood movie/etc and they're not shy about letting you use famous actors/characters in them. That might be an increasing advantage because the US labs are now being cautious.

At the very least the public should receive full open-weight open-source models in return for their transgressions. Failing that, may I suggest the guillotine?

In the US the courts are also pursuing labs that open their models: Meta's current court case is over the training data of the llama models they released openly.

[deleted]

I know (via probing these models) that some of my work is in the training data. My mailbox is open.

> At what point will we be better to support a lab that pays (some) licenses today vs the ones that pay none?

Why is a lab that pays all licenses today not on your list? Is ethics and morality that low on your radar?

I agree that that's a more consistent position for the people criticising the data slurping. But I don't see people advocating those open-data models in these threads? It's usually about defending the zero-licensing competitors.

My (limited, outsider) understanding is that due to the court cases US labs are pressured to be legal now (for instance, bulk scanning purchased books instead of Books3, and the licensing deals with media companies). But international labs are not. The "not licensing everything" statement is more about current copyright law not requiring licensing of everything. But that question is still up in the air as cases are ongoing.

You should. Companies like this will inevitably try and pull the ladder up behind them.

You mean Anthropic and OpenAI, right?

All major AI companies. And any other high value industry which can be locked off (via tax brakes, patients etc)

Ironically, it's likely that the only reason USG let them get away with this — instead of making obvious and necessary adjustments to copyright law — was so that the industry would remain competitive with China.

Given that the most recent time Anthropic attempted regulatory capture, the US government responded by saying "alright, we agree that Mythos is too dangerous to release, so we've banned you from releasing Mythos," I can't wait to see what the outcome of this next push is.

Anthropic did pay $1.5B to authors. But yes, it would be much better if they paid everyone on the internet dividends from every Claude chat. Or released Claude as an open model.

In practice, the former isn't very realistic, while the latter is politically dead as this is becoming a national security issue.

Anthropic was forced to pay some people they stole content from, there was no attempt at getting permission ahead of time.

And paying basically everyone online is more or less a solved problem, it's what ad agencies have to do every day.

[dead]