Hacker News

This is especially ironic, considering the same people will gladly use XML syntax and serve it as text/html. Historically, this has only worked because no relevant browser has ever implemented SGML (and NET [1], in particular), as required by HTML standards up to version 4 [2].

[1] https://en.wikipedia.org/wiki/Standard_Generalized_Markup_La...

[2] https://www.w3.org/TR/html401/conform.html#h-4.2

myfonj 3 days ago [ - ]

> Historically, […] no relevant browser has ever implemented SGML […] NET

I can probably confirm that "relevant" part of this claim for the times spanning from the first decade of 2000s, but I still desperately (in a way) seek information whether ANY even niche and obscure application that consumed "HTML" treated the NET as specified back then. I am quite certain W3C Validator did (that Mathias' article proves that, after all) and that Amaya might have do that, since it was a reference implementation from the same spec body, IIRC, but cannot swear on that.

Have anybody here have a clearer recollection of that times, or even some evidence?

I still find it strange such feature had such prominent space in the specs back then, but practically nowhere else.

JimDabell 3 days ago [ - ]

EMACS/W3 originally supported SHORTTAG NET but was “fixed” to remove support. In practical terms, mainstream browsers couldn’t afford to parse SHORTTAG NET properly because it was very common to leave attribute values unquoted. You can leave some values unquoted, but not ones with slashes in. So the very common error <a href=http://xn--rvg would not get parsed as the author expected if SHORTTAG NET was enabled.

This is the earliest reference I could locate easily, from the www-html mailing list:

https://lists.w3.org/Archives/Public/www-html/2002Nov/0057.h...

You’ll be able to find more if you go trawling through USENET archives of places like comp.infosystems.www.authoring.html from 25–30 years ago, but it was a fairly niche subject even back then.

I think there were a couple of other niche tools that supported it, but I don’t remember the details after all this time.

JimDabell 3 days ago [ - ]

I believe this is the exact change where support for SHORTTAG NET was removed from EMACS/W3 in order to support XHTML better:

https://github.com/emacsmirror/w3/commit/68af7c107dcbe194e30...

myfonj 3 days ago [ - ]

Thanks! That's actually really valuable insight and seems to be a promising start for a interesting investigation

I'd even say that from a glance, EMACS ("W3" browser in it) seems like possibly hugely relevant application, actually. Will look into it.

JimDabell 3 days ago [ - ]

If you really want to, you could check out Evolt’s browser archive:

https://browsers.evolt.org

It‘s got over a hundred ancient web browsers. I suspect none of them support SHORTTAG NET though.

myfonj 3 days ago [ - ]

Good idea. I remember I have done some research about this in the past when I tried to trace historical arguments for the infamous "should there be a space before slash in void tags for the best compatibility"

    <br/> vs <br /> (vs <br>)

discussion, but didn't get much far then (https://stackoverflow.com/a/30880386/540955).

JimDabell 3 days ago [ - ]

That’s not quite the whole story. Appendix C of the XHTML 1.0 specification provides HTML compatibility guidelines:

> This appendix summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents.

— https://www.w3.org/TR/xhtml1/#guidelines

And RFC 2854, which defines the text/html media type, explicitly states this is permissible to label as text/html:

> The text/html media type is now defined by W3C Recommendations; the latest published version is [HTML401]. In addition, [XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01 and which may also be labeled as text/html.

— https://datatracker.ietf.org/doc/html/rfc2854#section-2

However even browsers that support XHTML rendering use their HTML parser for XHTML 1.0 documents served as text/html, even though they should really be parsing them as XHTML 1.0.

But yes, that extra slash means something entirely different to the SGML formulation of HTML (HTML 2.0 to HTML 4.01). HTML5 ditched SGML though, so SHORTTAG NET is no longer a thing.

currysausage 3 days ago [ - ]

I believe the sentence from the RFC:

[XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01

is technically incorrect. While the XHTML 1 compatibility profile was compatible with HTML 4 as implemented by major browsers, that wasn't actually HTML 4. HTML 4 is based on SGML, while what was implemented was a combination of HTML 4 semantics with the tagsoup parsing rules that browsers organically developed. These rules were only later formalized as part of HTML 5.

The compatibility guidelines do recommend a space between <br and />, but (at least according to https://validator.w3.org/ in HTML 4 mode) this doesn't change anything about <br /> being a NET-enabling start-tag <br /, followed by a greather-than sign.

Enter this:

  <h1>Hello<br />world</h1>

and select "Validate HTML fragment", "HTML 4.01", and "Show Outline". This is the result:

  [H1] Hello>world

(Obviously nitpicking, but that's my point: the nitpickers can be out-nitpicked.)

JimDabell 3 days ago [ - ]

Haha yes. Appendix C gave compatibility guidelines, but you are right that doesn’t actually result in documents that could be parsed by a parser that implemented SHORTTAG NET.

Elsewhere in the thread, I posted an example of SHORTTAG NET being removed from a browser to enable parsing of XHTML documents:

https://github.com/emacsmirror/w3/commit/68af7c107dcbe194e30...

Nevertheless, the text/html RFC explicitly condones Appendix C, so despite it not being fully reflective of reality, it’s still permissible to use text/html to label XHTML 1.0 documents that follow Appendix C :D