If I recompile a program to fully utilize my cpu better (use AVX or whatever) then if my program takes 1 second to execute instead of 2, it likely did not use half the _energy_.

Obviously not. But scale it out to a fleet of 1000 servers running your program continuously, you can now shut down 10 for the same exact workload.

Sure, but we're talking about compiled packages being distributed by a package manager.

Yes but my point is: if I download the AVX version instead of the SSE version of a package and that makes my 1000 servers 10% _quicker_ that is not the same as being 10% more _efficient_.

Because typically these modern things are a way of making the CPU do things faster by eating more power. There may be savings from having fewer servers etc, but savings in _speed_ are not the same as savings in _power_ (and some times even work the opposite way)