> This highlights just how much unlicensed copyrighted material is in LLM training sets (whether you consider that fair use or not).

Is there any license copyrighted material in their original training sets? AFAIK, they just scrapped it all regardless of the license