This. I totally agree we will see better architectures for doing the calculations, lower energy usage inference hardware and also some models running on locally moving some of the "basic" inference stuff off the grid.
It's going to move fast I think and I would not surprised if the inference cost in energy is 1/10 of today in less than 5 years.
This said Jeavons paradox will likely mean we still use more power.