The US labs do seem to have announced a lot of licensing deals though, and are buying things today due to the previous lawsuits.

At what point will we be better to support a lab that pays (some) licenses today vs the ones that pay none?

Some of the deals are in the hundreds of millions, so I suspect licensing is over a billion today? (Pure guess). That might become a big disadvantage in a price (or content) war.

I haven't seen any money, have you? Until they pay everyone or release weights theres really no change. Also they're doing this after they've already stolen. Not negotiated before

My understanding is that US labs now are paying for books, news and other content from media companies, but people in the middle (like blog authors) are left out by current courts over whether fair use applies. There's definitely an argument over whether we should tighten this, but they do seem to be under increasing pressure to be legal now by our existing interpretation. Most cases are still ongoing.

One reason people love the Chinese video models is that they seem to be trained on every hollywood movie/etc and they're not shy about letting you use famous actors/characters in them. That might be an increasing advantage because the US labs are now being cautious.

At the very least the public should receive full open-weight open-source models in return for their transgressions. Failing that, may I suggest the guillotine?

In the US the courts are also pursuing labs that open their models: Meta's current court case is over the training data of the llama models they released openly.

[deleted]

I know (via probing these models) that some of my work is in the training data. My mailbox is open.

> At what point will we be better to support a lab that pays (some) licenses today vs the ones that pay none?

Why is a lab that pays all licenses today not on your list? Is ethics and morality that low on your radar?

I agree that that's a more consistent position for the people criticising the data slurping. But I don't see people advocating those open-data models in these threads? It's usually about defending the zero-licensing competitors.

My (limited, outsider) understanding is that due to the court cases US labs are pressured to be legal now (for instance, bulk scanning purchased books instead of Books3, and the licensing deals with media companies). But international labs are not. The "not licensing everything" statement is more about current copyright law not requiring licensing of everything. But that question is still up in the air as cases are ongoing.