Hacker News

PDF to raster seems a lot easier than PDF to structured data, at least in terms of dealing with the odd edge cases. PDF is designed to raster consistently, and if someone generates something that doesn't raster in enough viewers, they'll fix it. PDF does not have anything that constrains generators to a sensible structured representation of the information in the document, and most people generating PDF documents are going to look at the output, not run it through a system to extract the structured data.