Hacker News

remotely related, but I have yet to find a solution for page classification in a document for tables, i.e. a classifier that returns the index of pages containing tables in a document that is reliable

solutions using things like img2table or pymupdf are really bad (pymupdf is not even reliable for text pdfs)