Hacker News

> "This is exactly the reason why Computer Vision approaches for parsing PDFs works so well in the real world."

Well, to be fair, in many cases there's no way around it anyway since the documents in question are only scanned images. And the hardest problems I've seen there are narrative typography artbooks, department store catalogs with complex text and photo blending, as well as old city maps.