I do just want to highlight that this is also what humans do. We read a bunch of content online and then use it in our work product. The vast majority of the value that I provide comes from copyrighted information that I have ingested - either directly with a payment to the creator (bought and read the book, paid for and attended the seminar) or indirectly via third party blog posts or summaries where I did not then pay the originator of the materials.
I think there are real questions around motivations for creation of novel, high quality valuable content (I think they still exist but move to indirect monetization for some content and paywalls for high value materials).
I don't inherently have any problems with agents (or humans) ingesting content and using it in work product. I think we just need to accept that the landscape is changing and ensure we think through the reasons why and how content is created and monetized.
100% agreed. I have yet to hear a convincing argument for why it is creative accretion when I leverage all of the music I’ve ever listened to in order to write an “original” song, but its base plagiarism when AI does similar.
The only remotely credible position I’ve heard is “because humans are special, and AI is just a machine”, which is a doctrine but not an argument.
This whole discussion would have been incomprehensible any time before 1700 or so, when the idea that creators had exclusive rights to their work first appeared.
Somehow, human culture survived thousands of years when people just made things, copied things, iterated on others’ ideas. And now many of the same people who decried perpetual copyright are somehow railing against a frequently-transformative use.
I think what gets most people is the double standard.
IP should either exist for everyone (which would cripple LLM providers) or no one, in which case the Pirate Bay and shadow libraries should be fully open.
Re: the higher ranking plagarism, that stings and makes sense. AEO and SEO are a thing. We need better mechanisms for identifying "root sources" of content - it's something I find myself working on personally. As I ingest sources for my book I need to be able to build a classifier that incrementally moves towards finding origin sources. That said, it's in my interest to do that because there is a differentiated value in having access to the sources that regularly provide novel, valuable content.
To be fair there is also value (at least for now) in sites that aggregate quality content and republish as a secondary level of discovery if my agents don't go far enough down the search results, but I'd expect that value to diminish over time as I better tune my research and build my lists of originating authors.
And to be clear, I don't like the idea of people stealing someone elses content and republishing without attribution (although it has been going on long before ChatGPT) but I think now we can all run agentic research teams the "bad actors" will slowly get filtered out of the ecosystem.
> We read a bunch of content online and then use it in our work product.
We also have societal norms around plagiarism.
Additionally, the claim that because people have the right to do something then we should extend that right to machines is strong. (And one I certainly reject).