So I tried this on the NVMe specification (I have a huge library of PDFs) and it worked decently, though the output had some oddities:
- Parts of the table of contents were headings
- I didn't like how tables were links to separate markdown files.
In theory, I could recombine everything into one document, but that would require complicated Markdown parsing and manipulation and I wasn't even sure how to go about that given how free-form the resulting text was. I also haven't gone through the entire document (it's 784 pages) to check to make sure it's correct compared to what pdftotext or acrobat could create, so there's that too.