> Every 10x increase in model size requires 10x more power

Does it? I’ll be the first to admit I am so far behind on this area, but isn’t this assuming the hardware isn’t improving over time as well? Or am I missing the boat here?

Hardware gets faster but efficiency is stalling if not getting worse.

Hardware isn’t improving exponentially anymore, especially not on the flops/watt metric.

That’s part of what motivated the transition to bfloat16 and even smaller minifloat formats, but you can only quantize so far before you’re just GEMMing noise.