If markdown allowed class and id identifiers on each content description it would be sufficient to replace HTML.

Agreed. IDs feel like the right starting point — names before verbs.

From there, I could imagine thin protocol layers emerging above — renderers, voice interfaces, AI agents each binding their own behavior to the same IDs. Markdown stays plain text. Complexity through composition, not bloat.