Ooh blast from the past!

At a previous company we moved off of wkhtmltopdf to a nodejs service which received static html and rendered it to pdf using phantomjs. These days you probably use puppeteer.

The trick was keeping the page context open to avoid chrome startup costs and recreating `page`. The node service would initialize a page object once with a script inside which would communicate with the server via a named Linux pipe. Then, for each request:

1. node service sends the static html to the page over the pipe

2. the page script receives the html from the pipe, inserts it into the DOM, and sends an “ack” back over the pipe

3. the node service receives the “ack” and calls the pdf rendering method on the page.

I don’t remember why we chose the pipe method: I’m sure there’s a better way to pass data to headless contexts these days.

The whole thing was super fast(~20ms) compared to WK, which took at least 30 seconds for us, and would more often than not just time out.

Sounds like fun considering how real the problem is.

It was!

I remember the afternoon I had the idea: it was beer Friday -and it took a few hours to write up a basic prototype that rendered a PDF in a few hundred milliseconds. That was the first time I’d written a 100x speed improvement. Felt like a real rush.

Congratulations. Doesn't make this approach make so much more sense than writing a browser engine from scratch?

Maybe? I'd say it depends on what you're rendering. We rendered HTML that we created ourselves, filled in with data that we parsed and validated. Styles across the documents generated were also largely the same.

If your job is to render arbitrary user HTML, this could get much more hairy. First of all, print rendering at the time(and probably now) was notoriously finicky. Things like adjusting colors, improper rendering of SVGs, pagination were difficult. It took a lot of effort to get right.

Furthermore, if you're sending arbitrary HTML, you now have a much larger security exploit surface. If someone figures out how to call `addEventListener` within the page context, they can snoop on every PDF generated by that page.