Not really. A RISC design can have very complex timing and pipelines, with instructions converted to uOPs (and fused to uOPs!) just like X86: https://dougallj.github.io/applecpu/firestorm.html
Caches can be fast and very expensive (in area & power)! I have an HP PA-RISC 8900 with 768KB I&D caches. They are relatively fast and low latency, given the time-frame of their design. They also take up over half the die area.
I don't know how this has anything to do with what I said.
The original intent of uops in x86 was to break more complex instructions down into more simple instructions so the main pipeline wasn't super-variable.
If you look at new designs like M-series (or even x86 designs), they try very hard to ensure each instruction/uop retires in a uniform number of cycles (I've read[0] that even some division is done in just two cycles) to keep the pipeline busy and reduce the amount of timings that have to be tracked through the system. There are certainly instructions that take multiple cycles, but those are going to take the longer secondary pipelines and if there is a hard dependency, they will cause bubbles and stalls.