remotely related, but I have yet to find a solution for page classification in a document for tables, i.e. a classifier that returns the index of pages containing tables in a document that is reliable
solutions using things like img2table or pymupdf are really bad (pymupdf is not even reliable for text pdfs)
In my experience, this task is incredibly difficult for generality.
Handcrafting based on the dataset is the only way to get high performance.