IM messages aren’t really documents. They are text with some very minimal formatting that could be expressed with markdown. Any media attached isn’t embedded in the document, it’s attached externally / rendered at the bottom.

The only example I can think that messages are expressed as documents is Microsoft Teams. And it’s as much an example of what not to do as anything.

IM messages seem to be documents just as much as email or many things you'd normally call documents. A reasonable definition IMO would be:

    A self-contained rich formatted text file/package that optionally contains attachments or media.

I'd disagree with that for most messaging apps. If you think about Discord or Slack for example. You have a plain text message and then media attachments externally. This could be very well expressed with JSON.

Very few messaging apps let you go beyond plain text and let you start embedding media or complex layouts inside a message.

Slack messages have a ton of formatting. You could implement it with some sort of extension on markdown but you'd have to write a custom parser. XML gives you a markup structure for free.

Slack canvases have full layouts including images.

Slack messages contains a lot of additional formatting and media.

Even whatsapp does formatting, so does signal.

What is a current popular messaging app that does not have rich text features?

Eh, XML is a machine-readable generic markup language. Why would you prefer using a less powerful format like markdown in a context like message representation? XML with inline tags seems the perfect fit.

Less powerful also means less complex and less exploitable. You can very easily grab a markdown renderer rather than trying to decode a .docx for messages.

Pretty much no messaging apps let you create messages more complex than markdown anyway.

From a security perspective, I prefer a widespread commonly used XML parser over your custom parser for markdown extensions.