2. uops are a cope that costs. That uop cache and cache controller uses tons of power. ARM designs with 32-bit support had a uop cache, but they cut it when going to 64-bit only designs (look at ARM a715 vs a710) which dramatically reduced frontend size and power consumption.
3. The claim was never "stuck on 4-wide", but that going wider would incur significant penalties which is the case. AMD uses two 4-wide encoders and pays a big penalty in complexity trying to keep them coherent and occupied. Intel went 6-wide for Golden Cove which is infamous for being the largest and most power-hungry x86 design in a couple decades. This seems to prove the 4-wide people right.
4. This is only partially true. The ISA impacts which designs make sense which then impacts cache size. uop cache can affect L1 I-cache size. Page size and cache line size also affect L1 cache sizes. Target clockspeeds and cache latency also affect which cache sizes are viable.