I don't know how, but PyMuPDF4LLM is based on Tessaract and has GNN-based layout detection