Books?

The vast, vast, majority of AI training data is not books. I wouldn't be surprised if there's more text on HN alone than every book in the history of mankind (most of which are also no longer copyrighted).