Hacker News

I think a PDF 2.0 would just be an extension of a single file HTML page with a fixed viewport

I presume you meant that as "PDF next generation" because PDF 2.0 already exists https://en.wikipedia.org/wiki/History_of_PDF#ISO_32000-2:_20...

Also, absolutely not to your "single file HTML" theory: it would still allow javascript, random image formats (via data: URIs), conversely I don't _think_ that one can embed fonts in a single file HTML (e.g. not using the same data: URI trick), and to the best of my knowledge there's no cryptographic signing for HTML at all

It would also suffer from the linearization problem mentioned elsewhere in that one could not display the document if it were streaming in (the browsers work around this problem by just janking items around as the various .css and .js files resolve and parse)

I'd offer Open XPS as an alternative even given its Empire of Evil origins because I'll take XML over a pseudo-text-pseudo-binary file format all day every day https://en.wikipedia.org/wiki/Open_XML_Paper_Specification#C...

I've also heard people cite DjVu https://en.wikipedia.org/wiki/DjVu as an alternative but I've never had good experience with it, its format doesn't appear to be an ECMA standard, and (lol) its linked reference file is a .pdf

LegionMammal978 3 days ago [ - ]

As it happens, we already have "HTML as a document format". It's the EPUB format for ebooks, and it's just a zip file filled with an HTML document, images, and XML metadata. The only limitation is that all viewers I know of are geared toward rewrapping the content according to the viewport (which makes sense for ebooks), though the newer specifications include an option for fixed-layout content.

karel-3d 3 days ago [ - ]

you can "just" enforce pdf/a

...well there is like 50 different pdf/a versions; just pick one of them :)

cylemons 3 days ago [ - ]

That and only commercial pdf libraries support PDF/A. Apperantly, it is much harder than regular PDF so open source libs dont bother.

karel-3d 3 days ago [ - ]

I am currently building (as a side-project) an easy converter from PDF to PDF/A (PDF/A-3b)... a negative being that it is mostly based on Ghostscript, which is Affero GPL (mainly because Ghostscript makers also make money selling commercial licenses); and that in case of weird font, I just convert all fonts to bitmaps ( https://bugs.ghostscript.com/show_bug.cgi?id=708479 ). It's not done yet though... I am going through verapdf PDF/A testsuite ( https://github.com/veraPDF/veraPDF-corpus ) and still catching bugs

cess11 3 days ago [ - ]

As a producer of PDF files I mainly work with PDF/A. It's not particularly hard, just need to embed some information regarding colour space and fonts.

I use PDFBox for this purpose, it's Apache licensed.