> Genuine question: if I train my model with copyleft material, how do you prove I did?
An inverse of this question is arguably even more relevant: how do you prove that the output of your model is not copyrighted (or otherwise encumbered) material?
In other words, even if your model was trained strictly on copyleft material, but properly prompted outputs a copyrighted work is it copyright infringement and if so by whom?
Do not limit your thoughts to text only. "Draw me a cartoon picture of an anthropomorphic with round black ears, red shorts and yellow boots". Does it matter if the training set was all copyleft if the final output is indistinguishable from a copyrighted character?
> even if your model was trained strictly on copyleft material
That's not legal use of the material according to most copyleft licenses. Regardless if you end up trying to reproduce it. It's also quite immoral if technically-strictly-speaking-maybe-not-unlawful.
> That's not legal use of the material according to most copyleft licenses.
That probably doesn't matter given the current rulings that training an AI model on otherwise legally acquired material is "fair use", because the copyleft license inherently only has power because of copyright.
I'm sure at some point we'll see litigation over a case where someone attempts to make "not using the material to train AI" a term of the sales contract for something, but my guess would be that if that went anywhere it would be on the back of contract law, not copyright law.