The high-end CPU designs, be it ARMv7, AArch64, RISC-V or x86(-64), have parallelized pre-decoding hardware and buffers for decoded microinstruction because it too, apparently, speeds the execution. From what I understand, the differences in those subsystems that are due to the ISA baroqueness are, again, minuscule.
I've been reading up on this. The differences are indeed minimal. Still not zero, but not the explainer for why M series macs outperform intel x86 on power consumption https://chipsandcheese.com/p/why-x86-doesnt-need-to-die