You can extract text from PDF files. (there's a number of dedicated models for that, but even the humble pandoc can do it)