Intel has proposed APX to address this. It does away with some of the 32-bit garbage that complicates design for no good payoff. Most importantly, it increases from 16 to 32 registers and allows 3-register instructions (almost all x86 instructions are 1-register or 2-register instructions). This would strip out tons of MOV instructions which was proven with AMD64 to have a decent impact on performance.
Reduced I-Cache, uop cache, and decoder pressure would also have a beneficial impact. On the flip side, APX instructions would all be an entire byte longer than their AMD64 counterparts, so some of the benefits would be more muted than they might first appear and optimizing between 16 registers and shorter instructions vs 32 registers with longer instructions is yet another tradeoff for compilers to make (and takes another step down the path of being completely unoptimizable by humans).
>This would strip out tons of MOV instructions which was proven with AMD64 to have a decent impact on performance.
Sure, but the topic is optimizing power efficiency by removing support for an instruction set. That aside, if an instruction isn't very performant, it isn't much of an issue per se. It just means it won't get used much and so chip design resources will be suboptimally allocated. That's a problem for Intel and AMD, and for nobody else.