Hacker News

diptanu 3 days ago [ - ]

We parse PDFs to convert them to text in a linearized fashion. The use case for this would be to use the content for downstream use cases - search engine, structured extraction, etc.

throwaway4496 3 days ago [ - ]

None of that changes the fact that to get a raster, you have to solve the PDF parsing/rendering problem anyways, so might as well get structured data out instead of pixels so that it now another problem (OCR).