Are there complete lists in the suits? Last time I skimmed them, they contained allegations of sources, and some admissions like The Pile, LibGen, Books3, PiLiMi, scanned books, web scrapes and some other sources I don't remember, but AFAIK there isn't any complete inventory of training datasets they used.
Are there complete lists in the suits? Last time I skimmed them, they contained allegations of sources, and some admissions like The Pile, LibGen, Books3, PiLiMi, scanned books, web scrapes and some other sources I don't remember, but AFAIK there isn't any complete inventory of training datasets they used.