Hacker News

First I heard of it. https://en.wikipedia.org/wiki/Text_Encoding_Initiative

Understandable. I work in academic publishing, and while the XML is everywhere crowd is graying, retiring, or even dying :( it still remains an excellent option for document markup. Additionally, a lot of government data produced in the US and EU make heavy use of XML technologies. I imagine they could be an interested consumer of Nanonets-OCR. TEI could be a good choice as well tested and developed conversions exist to other popular, less structured, formats.

agoose77 16 days ago [ - ]

Do check out MyST Markdown (https://mystmd.org)! Academic publishing is a space that MyST is being used, such as https://www.elementalmicroscopy.com/ via Curvenote.

(I'm a MyST contributor)

viraptor 16 days ago [ - ]

Do you know why myst got traction, instead of RST which seems to have all the custom tagging and extensibility build in from the beginning?

agoose77 15 days ago [ - ]

MyST Markdown (the MD flavour, not the same-named Document Engine) was inspired by ReST. It was created to address the main pain-point of ReST for incoming users (it's not Markdown!).

As a project, the tooling to parse MyST Markdown was built on top of Sphinx, which primarily expects ReST as input. Now, I would not be surprised if most _new_ Sphinx users are using MyST Markdown (but I have no data there!)

Subsequently, the Jupyter Book project that built those tools has pivoted to building a new document engine that's better focused on the use-cases of our audience and leaning into modern tooling.

jxramos 16 days ago [ - ]

maybe even epub, which is xhtml