I tend to think its putting the memory on the package. Putting the memory on the package has given the M1 over 400GB/s which is a good 4x that on a usual dual channel x64 CPU and the latency is half that of going out to a DRAM slot. That is drastic and I remember when the northbrige was first folded into the CPU by AMD with the Athlon and it had a similarly big improvements in performance. It also reduces power consumption a lot.

The cost is flexibility and I think for now they don't want to move to fixed RAM configurations. The X3D approach from AMD gets a good bunch of the benefits by just putting lots of cache on board.

Apple got a lot of performance out of not a lot of watts.

One other possibility on power saving is the way Apple ramps the clockspeed. Its quite slow to increase from its 1Ghz idle to 3.2Ghz, about 100ms and it doesn't even start for 40ms. With tiny little bursts of activity like web browsing and such this slow transition likely saves a lot of power at a cost of absolute responsiveness.

> and the latency is half that of going out to a DRAM slot.

No, it's not. DRAM latency on Apple Silicon is significantly higher than on the desktop, mainly because they use LPDDR which has higher latencies.

I was going to mention this as well.

Source: chipsandcheese.com memory latency graphs

Yes, this saves a lot of power and adds performance. But destroys your eco system and annoys a vocal user base. Apple has no eco system and lots of fans, so they are playing their cards right.

A small reason for less power consumption with on die RAM is that you don't need active termination, which does use a few watts of power. It isn't the main reason that the Macs use less power, though.

this slow transition likely saves a lot of power at a cost of absolute responsiveness.

Not necessarily. Running longer at a slower speed may consume more energy overall, which is why "race to sleep" is a thing. Ideally the clock would be completely stopped most of the time. I suspect it's just because Apple are more familiar with their own SoC design and have optimised the frequency control to work with their software.

Memory bandwidth is not what makes the CPU fast and efficiency. The CPU doesn’t even have access to the full Apple Silicon bandwidth.

On package memory increases efficiency, not speed.

However, most of the speed and efficiency advantages are in the design.