Other PDF parsing woes include:

1. Identifying form elements like check boxes and radio buttons. 2. Badly oriented PDF scans 3. Text rendered as bezier curves 4. Images embedded in a PDF 5. Background watermarks 6. Handwritten documents

PDF parsing is hell indeed: https://unstract.com/blog/pdf-hell-and-practical-rag-applica...