Not many workloads are RAM bandwidth limited. Power and latency are much more common bottlenecks, and HBM loses on both of those.

Multicore workloads do tend to hit RAM bandwidth limits before they hit power constraints. If you do the math, running at max frequency and core utilization would usually imply you could only access a byte or so per core clock cycle. Perhaps a mere handful of bytes for the highest-performance systems with in-package RAM.

Isn’t memory bandwidth super relevant for AI?

It is like the most important performance figure. When I use an LLM that mostly fits on my GPU, the GPU will run at about 30% of its maximum power consumption - probably because the memory can't feed the ALUs fast enough. Similarly for the part that runs on the CPU, the CPU cores will show 100% utilization but not consume as much power as they usually do under full load. The GUI will also be choppier than usual under full load (noticeable, but not too annoying) presumably because pixel pushing also needs some nontrivial memory bandwidth which is hard to get.

Yes and so we use HBM for AI (among other things), but that's an exception. For things like games or displaying webpages, its not very important and we generally don't put HBM into things for that.