RISC is more about the time each instruction takes rather than how many instructions because consistent timing reduces bubbles and complexity. In this sense, RISC has completely won. Complexity of new instructions in the main pipeline is very restricted by this limitation and ISAs like x86 break down complex instructions into multiple small instructions before pushing them through.
ARMv9 still has very few instruction modes with far less complexity when you compare it with x86 or some other classic CISC ISA.
> memory access performance hasn't increased to the same extent as compute performance, thus putting a relatively bigger emphasis on code density.
The problem isn't RAM. The problem is that (generally speaking) cache is either big or fast. x86 was stuck at 32kb for a decade or so. Only recently have we seen larger 64kb caches included. Higher code density means more cache hits. This is the big reason to care about code density in modern CPUs.
RISC-V shows that you can remain RISC and still have great code density. Despite arguably making some bad/terrible decisions about the C instructions, RISC-V still generally beats x86 in code density by a significant margin (and growing as they add new instructions for some common cases).
Not really. A RISC design can have very complex timing and pipelines, with instructions converted to uOPs (and fused to uOPs!) just like X86: https://dougallj.github.io/applecpu/firestorm.html
Caches can be fast and very expensive (in area & power)! I have an HP PA-RISC 8900 with 768KB I&D caches. They are relatively fast and low latency, given the time-frame of their design. They also take up over half the die area.
I don't know how this has anything to do with what I said.
The original intent of uops in x86 was to break more complex instructions down into more simple instructions so the main pipeline wasn't super-variable.
If you look at new designs like M-series (or even x86 designs), they try very hard to ensure each instruction/uop retires in a uniform number of cycles (I've read[0] that even some division is done in just two cycles) to keep the pipeline busy and reduce the amount of timings that have to be tracked through the system. There are certainly instructions that take multiple cycles, but those are going to take the longer secondary pipelines and if there is a hard dependency, they will cause bubbles and stalls.