I have started treading everything as images when multimodal LLMs appeared. Even emails. It's so much more robust. Especially emails are often used as a container to send a PDF (e.g. a contract) that contains an image of a contract that was printed. Very very common.
I have just moved my company's RAG indexing to images and multimodal embedding. Works pretty well.