Unicode has U+200B ZERO WIDTH SPACE for that purpose. In HTML and hence Markdown you can also use `<wbr>`. If you’re using a custom setup anyway, you can have it be inserted automatically by regex replacement, as a pre-rendering step.

I think you’ve misunderstood something? This is about suppressing the turning of a segment break into a space, not about line break opportunities.

> Unicode has U+200B ZERO WIDTH SPACE for that purpose.

ZWSP is not at all “for that purpose”. If you mean this:

  A—&ZeroWidthSpace;
  B
Well, I am mildly surprised to find that no extra space is added in Gecko or Blink. But in WebKit, a space is still added; for this is part of the “UA-defined” bit I quoted.

And if you’re willing to do preprocessing, you can just merge the lines, that’d actually work.

> In HTML and hence Markdown you can also use `<wbr>`.

I fail to see how <wbr> is relevant.

Indeed, I skimmed a bit and misread “unable to break” to mean that you wanted a line-break opportunity but the renderer didn’t allow for it when a letter is directly following an em dash. But it’s the other way around, you want a line break in the source after an em dash to not translate into a space in the rendering. This would likewise be possible to handle by regex replacement as a pre-rendering step.

More generally, I see markup languages and the details of how they are rendered as largely orthogonal. You don’t necessarily need to invent a different markup language in order to adjust the rendering.

> More generally, I see markup languages and the details of how they are rendered as largely orthogonal. You don’t necessarily need to invent a different markup language in order to adjust the rendering.

There’s not much to a markup language beyond how it’s rendered. If you don’t ever want to render it to something other than plain text, just write plain text however you desire. The reason for choosing a particular markup language is to express intended semantics (for plain-text and rendered use), and to render it. The semantics aspect is legitimate, so I won’t say the language and rendering are identical or parallel, but they’re definitely nothing like orthogonal. If you’re using a CommonMark pipeline, any preprocessing you do means you’re not actually writing in CommonMark, but an incompatible variant of it. You may well deem it worthwhile, but it’s no longer the same markup language.