It is like the most important performance figure. When I use an LLM that mostly fits on my GPU, the GPU will run at about 30% of its maximum power consumption - probably because the memory can't feed the ALUs fast enough. Similarly for the part that runs on the CPU, the CPU cores will show 100% utilization but not consume as much power as they usually do under full load. The GUI will also be choppier than usual under full load (noticeable, but not too annoying) presumably because pixel pushing also needs some nontrivial memory bandwidth which is hard to get.
Yes and so we use HBM for AI (among other things), but that's an exception. For things like games or displaying webpages, its not very important and we generally don't put HBM into things for that.
It is like the most important performance figure. When I use an LLM that mostly fits on my GPU, the GPU will run at about 30% of its maximum power consumption - probably because the memory can't feed the ALUs fast enough. Similarly for the part that runs on the CPU, the CPU cores will show 100% utilization but not consume as much power as they usually do under full load. The GUI will also be choppier than usual under full load (noticeable, but not too annoying) presumably because pixel pushing also needs some nontrivial memory bandwidth which is hard to get.
Yes and so we use HBM for AI (among other things), but that's an exception. For things like games or displaying webpages, its not very important and we generally don't put HBM into things for that.