[flagged]

The issues with LLMs go beyond just IP theft. I would not say PRC making LLMs cheaper is the best outcome (though it is better than nothing). The best outcome would be to make the practice of training on our data without consent illegal, which would simultaneously slow down economic change and make it more organic as well as give PRC companies less capabilities to extract.

> The issues with LLMs go beyond just IP theft.

There is no IP theft because LLM outputs aren't protected, just egregious ToS violations.

> There is no IP theft because LLM outputs aren't protected, just egregious ToS violations

I meant original IP theft that occurs to train LLMs in the first place. But sure that implies that further LLMs based on that LLM are also tainted by that original IP theft.

- Deriving a “no derivatives” licensed item is illegal, no?

- Selling a “no commercial” licensed item is illegal, no?

- Deriving and/or reproducing MIT licensed code without credit is illegal, no?

- Reproducing and/or deriving GPL code and not notifying and/or not making GPL is illegal, no?

I can't make heads or tails of your opinion-free comment, made up of only questions.

My best guess is you're suggesting that Anthropic's model outputs are transitively under copyright (as a reproductions of human work under copyright?), but somehow ownership now belongs to Anthropic and not the original owners, and therefore Anthropic has standing against Alibaba? Not only does this go against what Anthropic argued in court against authors and publishers, such jurisprudence would lead to the immediate shutdown all leading LLMs in the US which were all trained on stolen work.

> immediate shutdown all leading LLMs in the US

They can license training data. They have trillions, look what they are dumping into it, you seriously think they can't afford to license data.

Obviously it would be easier if they do it from the start, but that was their trick, to do it while people don't notice and get big ASAP. Should they get away with it?

Also, it would solve their Chinese problem, because it would make them violate copyright too. Right now it's more like rules for thee not for me so it's hard to take seriously.

i still want those data sets to become public domain. open weights still isnt good enough

That's the conundrum isn't it? Anyone that posts their datasets would be immediately sued/blocked/boycotted to oblivion due to the obvious and blatant data theft, not to mention IP and copyright issues.

Nvidia's even being sued for providing scripts which automate the downloading of said data from non-Nvidia sources. We certainly don't need copyrights that last nearly a century after the author's death (they literally cannot help the author), so here's hoping that some of the disputes over all this money changing hands can reign in some of the existing copyright sprawl. A stronger public domain would provide more useful training data for everyone, including open source models, and make criminals out of fewer AI researchers.

[dead]

I hope you say the same when these cheap llms are used in drones to target humans. The world models are exactly built with that direction in mind.

Cool beans boomer alarmist stance. The Chinese models here are doing what they’re supposed to price the market accordingly.

[dead]