I'm not sure I understand what you're arguing for? There are massive companies that collectively profiting off of stolen IP and are now gatekeeping even their paid offerings - surely consumers will rail against this? Personally, I feel very bad and can't wait for Chinese models to continue improving as much as they can prior OpenAI's and Anthropic's IPOs.
I’m not arguing for anything, actually. The ‘fair’ ship has sailed, even if the pirates somehow get shut down (which would be suicide by USG, won’t happen, national security issue), open Chinese models are not even hiding the fact that they distill from the frontier US labs, thus benefiting indirectly from the stolen content.
Note I don’t particularly like the ‘stolen’ word here as I don’t like when the music and film companies use it in the same context. Copyright infringement? Sure. Theft? No.
> I don’t particularly like the ‘stolen’ word here
Except that's the standard that we've measured everyone with up until the LLM/generative tech boom. I don't see why the benchmarks should change now. I realise my argument doesn't move reality but that doesn't mean we shouldn't call a spade a spade. Said companies carried out theft (or copyright infringement if you prefer) at industrial scale which is far more reprehensible crime against humanity than anything the individuals we think of as "digital pirates" have committed.
> open Chinese models are not even hiding the fact that they distill from the frontier US labs
The difference is they return to the same system that they feed from (indirectly); people get access to model weights even if the entire model isn't open source. The same can't be said for OpenAI, Anthropic, Google etc (who also benefit from Chinese models and train on them).
Sure, the alternatives aren't a panacea of fairness but I'd much rather advocate for and support the thieves who give me a better deal if my choice is limited to thieves. Especially if thieves aren't hostile to their customers like Anthropic is (which is why I replied to you in the first place).
And the Chinese models rip IP just like everyone else before them. Your argument is moot.
This was a problem for 5+ years ago. Nobody cares or at least the majority voice does not care across the world. Cat is out of the bag and there is no way to put it back in.
EDIT: Worth noting that I have long held the belief that if you put data out on the public sidewalk that you should have low to no expectation that it’s IP. It’s how I think about Google Maps data for example. If they want to reap the benefits by not walking it off the a user login than they can feel the pain if folks use that information. Same applies for media that has been bought, Reddit comments or any other datasets.
> And the Chinese models rip IP just like everyone else before them.
The difference is the Chinese models return to the same system that they feed from (indirectly); people get access to model weights even if the entire model isn't open source. The same can't be said for OpenAI, Anthropic, Google etc (who also benefit from Chinese models and train on them).
Further, Chinese models are significantly cheaper and the comapnies aren't hostile to their customers.
> Worth noting that I have long held the belief that if you put data out on the public sidewalk that you should have low to no expectation that it’s IP.
Except your beliefs aren't the cornerstone of modern jurisprudence. Why are models able to reliably produce replicas of Ghibli movies which go well beyond any example you listed?
[flagged]