I created a Powershell script to run this locally on any PDF: https://gist.github.com/kordless/652234bf0b32b02e39cef32c71e...
It does work, but it is very slow on my older GPU (Nvidia 1080 8GB). I would say it's taking at least 5 minutes per page right now, but maybe more.
Edit: If anyone is interested in trying a PDF to markdown conversion utility built this that is hosted on Cloud Run (with GPU support), let me know. It should be done in about an hour or so and I will post a link up here when it's done.
Reporting back on this, here's some sample output from https://www.sidis.net/animate.pdf:
I haven't see ANY errors in what it has done, which is quite impressive.Here, it's doing tables of contents (I used a slightly different copy of the PDF than I linked to):
Other than the fact it is ridiculously slow, this seems to be quite good at doing what it says it does.Very very interested!
Ok, I have it built but things came up and I'm testing this morning (probably still broken but the code is all there):
https://github.com/kordless/gnosis-ocr