It's definitely far easier to emit a controlled, useful subset of PDF than it is to parse PDF documents. I wrote a small PDF library for the Decker ecosystem that just focuses on bitmaps and page layout; roughly 4kb and 135 LoC.
docs/demos: https://beyondloom.com/decker/pdf.html
browsable source: https://github.com/JohnEarnest/Decker/blob/main/examples/dec...
This decker stuff is pretty nifty too
I’m working on one rn. It takes arbitrary PDFs and builds composable dynamic pandoc pipelines to match the source byte for byte output. It’s very very complex. But if I can get it finished it will fuck over Adobe so worth it.