Hacker News

I would hire someone who understands PDFs instead of doing the equivalent of printing a digital document and scanning it for "digital record keeping". Stop everything and hire someone who understands the basics of data processing and some PDF.

sidebute 2 days ago [ - ]

What's the economic justification?

Let's assume we have a staff of 10 and they're fully allocated to committed features and deadlines, so they can't be shifted elsewhere. You're the CTO and you ask the BOD for another $150k/y (fully burdened) + equity to hire a new developer with PDF skills.

The COB asks you directly: "You can get a battle-tested PDF parser off-the-shelf for little or no cost. We're not in the PDF parser business, and we know that building a robust PDF parser is an open-ended project, because real-world PDFs are so gross inside. Why are you asking for new money to build our own PDF parser? What's your economic argument?"

And the killer question comes next: "Why aren't you spending that $150k/y on building functionality that our customers need?" If don't give a convincing business justification, you're shoved out the door because, as a CTO, your job is building technology that satisfies the business objectives.

So CTO, what's your economic answer?

throwaway4496 2 days ago [ - ]

The mistake all of you're making is the assumption that PDF rendering means rasteration. Everything else crumbles down from that misconception.

user____name a day ago [ - ]

So if you receive a pdf full of sections containing prerasterized text (e.g adverts, 3d rendered text with image effects, scanned documents, handwritten errata), what do you do? You cannot use OCR because apparently only pdf-illiterate idiots would try such a thing?

throwaway4496 21 hours ago [ - ]

I wouldn't start by rastering the rest of the PDF. In business world, unlike academia and bootleg books and file sharing, majority of PDFs are computer generated. I know because I do this for a living.