Hacker News

> someone made a great ... in-memory low level pdf reading and writing data structure

Are you suggesting Adobe's Core Object Application Programming Interface (COAPI) for PDF isn't sufficient?

Kidding!

I worked on print production software in the '90s. Stuff like image positioning (eg bookwork), trapping, color separations, etc. Adobe's SDKs, for both PostScript and PDF, were most turrible. For our greenfield product for packaging (printing boxes), I wrote a minimalist PDF library, supporting just the feature set we needed. So simple.

Of course, PDF is now an ever growing katamari style All The Things amalgamation of, oops, sorry I ran out of adjectives.

Back to your point: after URLs and HTTP, the DOM is the 3rd best thing spawned by "the web".

The DOM concept itself. Isomorphism between in-memory and serialized. That its all just an object graph. Composition over inheritance.

Not the actual DOM API; gods no.

I understand that API design is wicked hard. But how is it that of the Java tools, only JDOM2 (the sequel) managed to get the class hierarchy correct? So that incorrect usage is not permitted?

(I haven't looked at popular libraries for other languages. I assume they all also fell into the trap of transliterating JavaScript's DOM's API. Like dom4j and successors did.)

I'm just repeating your point (I think) that Adobe should have staked a strong starting conceptual position on PDF internals, what a PDF is. Something more WinForms and less Win32.

30+ (?!) years later, I'm still flubbergasted by PDF's success, despite Adobe's stewardship.

PS- And another thing...

For a print description language, I greatly preferred HP's PCL-5. Emotionally, it just feels more honest somehow. Initially, Adobe couldn't decide if PDF was for print control or documents. Customers wanted documents, so Adobe grudgingly complied, haphazardly.

At least "the web" had/has committees.

"Adobe couldn't decide if PDF was for print control or documents"

Apparently people don't understand the history of PDF. PDF was originally a way to encapsulate PostScript so you could display it on a screen. Unlike PCL, Postscript (and PDF) were device-independent, with a WYSIWYG guarantee. Postscript and PDF are literally the history of WYSIWYG on personal computers and computer-based printing/typesetting.

PDF is not "print control" in the sense of a job control language. PDF has always been about documents, and the features of PDF files can be seen as an attempt by Adobe to both drive and follow the market's evolution of document handling.

PDF is complicated because it's used widely for lots of different things, including printing. And if you've never worked in the printing industry you have no idea how much of a PITA it is.

PDF succeeded for a lot of reasons, but probably the easiest explanation is that they were easier to create - you just printed it and the PDF printer driver spat out a PDF file that you could share everywhere.

mannyv 5 hours ago [ - ]

sleepybrett 3 hours ago [ - ]

One of my first jobs was at an isp/web/cohost company. We had a big bank of modems for dialup customers, had some customers who terminated isdn with us, a rack of colocation and built websites as well.

The company was partially owned and housed primarily in a print shop, we worked above the press floor and I was sometimes pressed into service helping when we were slow (I had some experience working in a print shop in highschool (helping with pagemaker and helping to run the big hidleberg), similarly in college.

Nothing like ending your day writing perl cgi scripts and troubleshooting customers damn winsock configurations and then going home and coughing up whatever color was running on the presses that day.