Just print to PDF in a browser, or automate that using a browser automation tool. For a non-browser-based open source solution, WeasyPrint.

https://weasyprint.org/

For a proprietary solution, try Prince XML:

https://www.princexml.com/

WeasyPrint works really well for me. It can support all of the languages and fonts I need. I run it on AWS Lambda and in Docker as a web service.

I previously used WKHTMLTOPDF, but it hasn't been supported for years and doesn't support the latest CSS, etc. It does support JS if you need it, but I'd probably look at headless Chromium or another solution for JS if needed.

Edit: Previous post with some good discussion: https://news.ycombinator.com/item?id=26578826

This is my experience and recommendation too.

+1 to weasyprint; I have used weasyprint with a django production system for a few years now, and it works well enough that I never have to think about it. I'm not doing anything fancy, though, but for me it has worked well.

I’ve had excellent experience with Prince XML and poor experience with everything else I’ve tried. Prince is fast, robust and reliable.

Yes it costs money. So does developer time.

Agreed. Prince also has a lot of good features for headers, footers, page numbering, etc, that make it very powerful.

https://stirlingpdf.io also uses weasyprint !!

There was a critical book that I read two years ago that is only available online. The web presentation is full of images of maps, artifacts, etc to help contextualize the content. No PDF converter tool has ever been up to the job of just extracting the text until this one. Thank you!

I'll join the choir. We use weasyprint for ebooks and invoices and it's a joy to use. Massively new support for features over the last few years (partially thanks to some monetary sponsorships), it started pretty bare bones, and is now close to commercial solutions.

The maintainers are also very responsive, and helpful.

Amazing project

> Just print to PDF in a browser

I tried yesterday. With compliments to the moms of SWE who coded the functionality in firefox. Aparently puting the screen on a pdf page is an insurmontable task in 2025. (20 years ago was still doable). I had to make a screenshot and process the picture to print it.

Orion browser produces PDFs which are exactly what you see on screen.

Most website do not have a print CSS, so it doesn’t print that nicely in PDF.

But, I upvote weasyprint for that instead.

> Most website do not have a print CSS, so it doesn’t print that nicely in PDF.

Can't they just render the screen content in a pdf ? Seems easy for other programs to do this.

Viewport size and deciding where to paginate makes a naive approach to this surprisingly difficult. That being said, if you can control the css / html, you can often solve these problems with a short media query and some hints at where to break pages (e.g. https://developer.mozilla.org/en-US/docs/Web/CSS/break-after).

Prince XML looks nice but what about creating a PDF directly from a website? This often adds some problems, for example links still pointing to other pages on the web. But in my experience printing to PDF is often not good enough.

Yes, I did that for a recent small program. The @media print media query is powerful enough for most of the stuff I wanted to format nicely. Even page breaks are possible.

These two are the only right answers if you want a reliable, reproducible, relatively low resource experience. Running a browser engine has always been hard to maintain in the long run for me.

+1 - Weasyprint is an excellent tool to make pdf from html content, and we're using it at work (with django) to export various documents.

Seconded. In my eccentric workflow, I use Weasyprint to convert HTML emails to more portable PDFs. A surprisingly successful experiment.