Use pymupdf to extract the PDF text. Hell, run that nasty business through an LLM as step-2 to get a beautiful clean markdown version of the text. Lord knows the PDF format is horribly complex!