I agree that cache efficiency is important. You can never have enough L1. It seems to me that compressed instructions ala ARM Thumb and RISC-V Compressed give you most of what you really want. One of the problems in the CISC era was that the compilers actually didn’t generate many of the fancy instructions, so it’s unclear whether you’d get back much from decoding massive amounts of micro-ops and letting the superscaler scheduler work it out if the compiler is mostly generating the simple instructions anyway. That said, the compilers of that era were also less sophisticated, so maybe we’d do better now.
In the end, though, I don’t see CISC making any significant come back, other than perhaps in embedded where code size is definitely important and speeds are generally lower and multi-cycle execution is ok. But it feels like we already have all the ISAs we need to cover that space pretty well already.