I created a Powershell script to run this locally on any PDF: https://gist.github.com/kordless/652234bf0b32b02e39cef32c71e...

It does work, but it is very slow on my older GPU (Nvidia 1080 8GB). I would say it's taking at least 5 minutes per page right now, but maybe more.

Edit: If anyone is interested in trying a PDF to markdown conversion utility built this that is hosted on Cloud Run (with GPU support), let me know. It should be done in about an hour or so and I will post a link up here when it's done.

Reporting back on this, here's some sample output from https://www.sidis.net/animate.pdf:

  THE ANIMATE
  AND THE INANIMATE

  WILLIAM JAMES SIDIS

  <img>A black-and-white illustration of a figure holding a book with the Latin phrase "ARTI et VERITATI" below it.</img>

  BOSTON

  RICHARD G. BADGER, PUBLISHER

  THE GORHAM PRESS

  Digitized by Google
I haven't see ANY errors in what it has done, which is quite impressive.

Here, it's doing tables of contents (I used a slightly different copy of the PDF than I linked to):

  <table>
    <tr>
      <td>Chapter</td>
      <td>Page</td>
    </tr>
    <tr>
      <td>PREFACE</td>
      <td>3</td>
    </tr>
    <tr>
      <td>I. THE REVERSE UNIVERSE</td>
      <td>9</td>
    </tr>
    <tr>
      <td>II. REVERSIBLE LAWS</td>
      <td>14</td>
    </tr>
Other than the fact it is ridiculously slow, this seems to be quite good at doing what it says it does.

Very very interested!

Ok, I have it built but things came up and I'm testing this morning (probably still broken but the code is all there):

https://github.com/kordless/gnosis-ocr