Also, absolutely not to your "single file HTML" theory: it would still allow javascript, random image formats (via data: URIs), conversely I don't _think_ that one can embed fonts in a single file HTML (e.g. not using the same data: URI trick), and to the best of my knowledge there's no cryptographic signing for HTML at all
It would also suffer from the linearization problem mentioned elsewhere in that one could not display the document if it were streaming in (the browsers work around this problem by just janking items around as the various .css and .js files resolve and parse)
I've also heard people cite DjVu https://en.wikipedia.org/wiki/DjVu as an alternative but I've never had good experience with it, its format doesn't appear to be an ECMA standard, and (lol) its linked reference file is a .pdf
As it happens, we already have "HTML as a document format". It's the EPUB format for ebooks, and it's just a zip file filled with an HTML document, images, and XML metadata. The only limitation is that all viewers I know of are geared toward rewrapping the content according to the viewport (which makes sense for ebooks), though the newer specifications include an option for fixed-layout content.
I am currently building (as a side-project) an easy converter from PDF to PDF/A (PDF/A-3b)... a negative being that it is mostly based on Ghostscript, which is Affero GPL (mainly because Ghostscript makers also make money selling commercial licenses); and that in case of weird font, I just convert all fonts to bitmaps ( https://bugs.ghostscript.com/show_bug.cgi?id=708479 ). It's not done yet though... I am going through verapdf PDF/A testsuite ( https://github.com/veraPDF/veraPDF-corpus ) and still catching bugs
I presume you meant that as "PDF next generation" because PDF 2.0 already exists https://en.wikipedia.org/wiki/History_of_PDF#ISO_32000-2:_20...
Also, absolutely not to your "single file HTML" theory: it would still allow javascript, random image formats (via data: URIs), conversely I don't _think_ that one can embed fonts in a single file HTML (e.g. not using the same data: URI trick), and to the best of my knowledge there's no cryptographic signing for HTML at all
It would also suffer from the linearization problem mentioned elsewhere in that one could not display the document if it were streaming in (the browsers work around this problem by just janking items around as the various .css and .js files resolve and parse)
I'd offer Open XPS as an alternative even given its Empire of Evil origins because I'll take XML over a pseudo-text-pseudo-binary file format all day every day https://en.wikipedia.org/wiki/Open_XML_Paper_Specification#C...
I've also heard people cite DjVu https://en.wikipedia.org/wiki/DjVu as an alternative but I've never had good experience with it, its format doesn't appear to be an ECMA standard, and (lol) its linked reference file is a .pdf
As it happens, we already have "HTML as a document format". It's the EPUB format for ebooks, and it's just a zip file filled with an HTML document, images, and XML metadata. The only limitation is that all viewers I know of are geared toward rewrapping the content according to the viewport (which makes sense for ebooks), though the newer specifications include an option for fixed-layout content.
you can "just" enforce pdf/a
...well there is like 50 different pdf/a versions; just pick one of them :)
That and only commercial pdf libraries support PDF/A. Apperantly, it is much harder than regular PDF so open source libs dont bother.
I am currently building (as a side-project) an easy converter from PDF to PDF/A (PDF/A-3b)... a negative being that it is mostly based on Ghostscript, which is Affero GPL (mainly because Ghostscript makers also make money selling commercial licenses); and that in case of weird font, I just convert all fonts to bitmaps ( https://bugs.ghostscript.com/show_bug.cgi?id=708479 ). It's not done yet though... I am going through verapdf PDF/A testsuite ( https://github.com/veraPDF/veraPDF-corpus ) and still catching bugs
As a producer of PDF files I mainly work with PDF/A. It's not particularly hard, just need to embed some information regarding colour space and fonts.
I use PDFBox for this purpose, it's Apache licensed.