I felt this article didn't really explain why a RISC chip with more ops could be as fast as a CISC chip with fewer ops.

I think the actual explanation is that the CISC ops are decoded to more or less the same or similar types of RISC ops, but requiring more physical hardware to do the decode, correct?

The tradeoff here being lower memory for instructions, but more silicon+transistors needed for decode hardware.