Long shot, but I wonder if an image of the pdf would do better if it did get unstuck on internal formats.

It definitely does. PDF is a vector-based image format historically, and all add-ons that make it behave a bit more sane as a text-oriented document format are optional, so your mileage using tools like pdftotext will vary greatly depending on who created a given PDF.