See https://digitalcorpora.org/corpora/file-corpora/cc-main-2021... for a set of 8 million PDF files from the web, as seen by a single crawl of Common Crawl.
See https://digitalcorpora.org/corpora/file-corpora/cc-main-2021... for a set of 8 million PDF files from the web, as seen by a single crawl of Common Crawl.