I don't know how this has anything to do with what I said.

The original intent of uops in x86 was to break more complex instructions down into more simple instructions so the main pipeline wasn't super-variable.

If you look at new designs like M-series (or even x86 designs), they try very hard to ensure each instruction/uop retires in a uniform number of cycles (I've read[0] that even some division is done in just two cycles) to keep the pipeline busy and reduce the amount of timings that have to be tracked through the system. There are certainly instructions that take multiple cycles, but those are going to take the longer secondary pipelines and if there is a hard dependency, they will cause bubbles and stalls.