The issue is that it's no longer actually RISC-V M at the point, you're changing the instruction set. If you're compiling RISC-V M code, that doesn't need the extra NOP.

That being said, the disabling of MUL is being done at a software project level here, not by the CPU vendor. It's in the same linked commit that added in the NOP instructions to the arithmetic routines.

If your software runs on any chip and your chip runs any software, you have a problem, but in embedded cases, you know which chip runs which software, because you designed them together.

This is very true and why I'm not liking that Xilinx is trying to go the other way. It really gets in the way and doesn't work. I know what's connected to what and how, but their system device tree generator doesn't and it yells really loud about that. And I don't even need a device tree, just xparameters.h