Hacker News

It is like the most important performance figure. When I use an LLM that mostly fits on my GPU, the GPU will run at about 30% of its maximum power consumption - probably because the memory can't feed the ALUs fast enough. Similarly for the part that runs on the CPU, the CPU cores will show 100% utilization but not consume as much power as they usually do under full load. The GUI will also be choppier than usual under full load (noticeable, but not too annoying) presumably because pixel pushing also needs some nontrivial memory bandwidth which is hard to get.