Hacker News

[flagged]

The issues with LLMs go beyond just IP theft. I would not say PRC making LLMs cheaper is the best outcome (though it is better than nothing). The best outcome would be to make the practice of training on our data without consent illegal, which would simultaneously slow down economic change and make it more organic as well as give PRC companies less capabilities to extract.

overfeed 15 hours ago [ - ]

> The issues with LLMs go beyond just IP theft.

There is no IP theft because LLM outputs aren't protected, just egregious ToS violations.

anileated 14 hours ago [ - ]

> There is no IP theft because LLM outputs aren't protected, just egregious ToS violations

I meant original IP theft that occurs to train LLMs in the first place. But sure that implies that further LLMs based on that LLM are also tainted by that original IP theft.

bayindirh 15 hours ago [ - ]

- Deriving a “no derivatives” licensed item is illegal, no?

- Selling a “no commercial” licensed item is illegal, no?

- Deriving and/or reproducing MIT licensed code without credit is illegal, no?

- Reproducing and/or deriving GPL code and not notifying and/or not making GPL is illegal, no?

overfeed 14 hours ago [ - ]

I can't make heads or tails of your opinion-free comment, made up of only questions.

My best guess is you're suggesting that Anthropic's model outputs are transitively under copyright (as a reproductions of human work under copyright?), but somehow ownership now belongs to Anthropic and not the original owners, and therefore Anthropic has standing against Alibaba? Not only does this go against what Anthropic argued in court against authors and publishers, such jurisprudence would lead to the immediate shutdown all leading LLMs in the US which were all trained on stolen work.

anileated 14 hours ago [ - ]

> immediate shutdown all leading LLMs in the US

They can license training data. They have trillions, look what they are dumping into it, you seriously think they can't afford to license data.

Obviously it would be easier if they do it from the start, but that was their trick, to do it while people don't notice and get big ASAP. Should they get away with it?

Also, it would solve their Chinese problem, because it would make them violate copyright too. Right now it's more like rules for thee not for me so it's hard to take seriously.

8note 16 hours ago [ - ]

i still want those data sets to become public domain. open weights still isnt good enough

bendews 16 hours ago [ - ]

That's the conundrum isn't it? Anyone that posts their datasets would be immediately sued/blocked/boycotted to oblivion due to the obvious and blatant data theft, not to mention IP and copyright issues.

timschmidt 15 hours ago [ - ]

Nvidia's even being sued for providing scripts which automate the downloading of said data from non-Nvidia sources. We certainly don't need copyrights that last nearly a century after the author's death (they literally cannot help the author), so here's hoping that some of the disputes over all this money changing hands can reign in some of the existing copyright sprawl. A stronger public domain would provide more useful training data for everyone, including open source models, and make criminals out of fewer AI researchers.

throwccp 16 hours ago [ - ]

[dead]

anukin 15 hours ago [ - ]

I hope you say the same when these cheap llms are used in drones to target humans. The world models are exactly built with that direction in mind.

rekttrader 15 hours ago [ - ]

Cool beans boomer alarmist stance. The Chinese models here are doing what they’re supposed to price the market accordingly.

cindyllm 15 hours ago [ - ]

[dead]