Does the M series have a flat memory model? If so, I believe that may be the difference. I'm pretty sure the entire x86 family still pages RAM access which (at least) quadruples activity on the various busses and thus generates far more heat and uses more energy.

I'm not aware of any CPU invented since the late eighties that doesn't have paged virtual memory. Am I misunderstanding what you mean? Can you expand on where you are getting the 4x number from?

I doubt any CPU has more levels of address translation, caching, and other layers of memory access indirection than AMD/Intel 64 at this point.

That's an interesting question about the number of levels of address translation. Does anyone have numbers for that, and how much latency and energy an extra layer costs?