That ship has sailed, I would wager all the AI labs are ingesting anything human generated, whether that means Hollywood movies, Taylor Swift’s discography, YouTube videos or private GitHub source repos.
The reward for having a competitive edge is exponentially higher than the risk of a lawsuit. Politicians are still old bureaucrats who don’t understand technology.
The entire chat thread and email exchange was exposed in Discovery; apparently Zuck signed off on it. In one of the IM exchanges one of them say ‘everyone is doing it’
As I understand it what was "explicitly illegal" was copying the books, in the sense of mere copying before feeding them to the model, and this is what the Anthropic copyright settlement is about.
Actually processing them through the model, though, was considered transformative and therefore fair use.
I'd love to see an open-source project that's basically a Torrent client for downloading pirated material, but it trains an AI model "in the background" using the downloaded content. That way everyone can claim fair use for possessing copyrighted material, I mean there's precedent right?
I am not a lawyer, from what I understand that the precedent is that you can use copyrighted material in ML process. Even though meta has, allegedly, pirated the material, the cost of violation would be pennies compared to the ai spend, since that is the violation, not that they used those materials,
That ship has sailed, I would wager all the AI labs are ingesting anything human generated, whether that means Hollywood movies, Taylor Swift’s discography, YouTube videos or private GitHub source repos.
The reward for having a competitive edge is exponentially higher than the risk of a lawsuit. Politicians are still old bureaucrats who don’t understand technology.
so did Meta for Llama.
The entire chat thread and email exchange was exposed in Discovery; apparently Zuck signed off on it. In one of the IM exchanges one of them say ‘everyone is doing it’
https://x.com/jason_kint/status/1879152507865485497?s=20
As I understand it what was "explicitly illegal" was copying the books, in the sense of mere copying before feeding them to the model, and this is what the Anthropic copyright settlement is about.
Actually processing them through the model, though, was considered transformative and therefore fair use.
They didn't train on the books and that court only found that the pirating was illegal anyway.
I'd love to see an open-source project that's basically a Torrent client for downloading pirated material, but it trains an AI model "in the background" using the downloaded content. That way everyone can claim fair use for possessing copyrighted material, I mean there's precedent right?
I am not a lawyer, from what I understand that the precedent is that you can use copyrighted material in ML process. Even though meta has, allegedly, pirated the material, the cost of violation would be pennies compared to the ai spend, since that is the violation, not that they used those materials,
They were liable for copying the books in the first place, regardless of whether or not they trained the AI with them. Read the opinion.