> If you take all the works of Shakespeare, and reduce it to tokens and vectors is it Shakespeare or is it factual information about Shakespeare?
To rephrase the question:
Is a PDF of the complete works of Shakespeare Shakespeare, or is it factual information about Shakespeare?
Reencoding human-readable information into a form that's difficult for humans to read without machine assistance is nothing new.
Like most things in law, the answers are going to come down to intent and outcome. If you distribute the PDF to other people with the intent that they can read the copyrighted works of an author, then you have distributed that author's content in violation of copyright. If on the other hand, you encrypted the entire contents of that PDF, threw away the encryption key and the published prints of the PDF as artwork of binary code, that's probably going to fall on the side of "fair use" even though the entire copyrighted work is input to and contained in your final output. Though you might get into some legal hot water if you promoted your work using the author's name, but that's more of a trademark issue than a copyright issue.
> Like most things in law, the answers are going to come down to intent and outcome. If you distribute the PDF...
I wasn't talking about distribution, and neither was the person whom I was replying to. But, thanks for wasting your time on publishing the rest of your comment, I guess.