Maybe a dumb question but I have always wondered, why does the (authoring?) spec not consider e.g. "doctypehtml" as valid HTML if compliant parsers have to support it anyway? Why allow this situation where non-compliant HTML is guaranteed to work anyway on a compliant parser?

It's considered a parse error [0]: it basically says that a parser may reject the document entirely if it occurs, but if it accepts the document, then it must act as if a space is present. In practice, browsers want to ignore all parse errors and accept as many documents as possible.

[0] https://html.spec.whatwg.org/multipage/parsing.html#parse-er...

> a parser may reject the document entirely if it occurs

Ah, that's what I was missing. Thanks! The relevant part of the spec:

> user agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification.

(https://html.spec.whatwg.org/multipage/parsing.html#parse-er...)

Because there are multiple doctypes you can use. The same reason "varx" is not valid and must be written "var x".

Same reason <ahref="/page.html"> is invalid.