We use pop-cultural references to communicate all the time these days. Those don't necessarily come from only the most commonly known sections of these works, so the AI would necessarily need the full work (or a functional transformation of the work) to be able to hit the theoretical maximum of the ability to decode about and reason using such references. To exclude copyrighted works from the training set is to expect it to decode from the outside what amounts to humanity's own in-group jokes.
That's my formal argument. The less formal one is that copyright protection is something that smaller artists deserve more than rich conglomerates, and even then, durations shouldn't be "eternity and a day". A huge chunk of what is being "stolen" should be in the commons anyway.
"Your honor, if I hadn't robbed that bank I wouldn't have gotten all that money!"