According to the recent article HBM memory is 3x less efficient wafer area wise than LPDDR; but the bandwidth is more than triple.
What if its in everyone's interest to buy computers at say 1/3rd the rate and switch everything over to HBM?
the discrepancy between compute and memory has been growing for ages, perhaps a painful switch to HBM is exactly what we need?
Would you rather have 3 intermediate computers with low memory bandwidth, or wait a little longer statistically so that we can all enjoy a new computer at 1/3rd the rate but much higher bandwidth than the area ratio?
These are fundamentally different points in design space though, hbm doesn’t have a 10mw idle draw like lpddr does.
Can’t put HBM in smartphones and laptops. The power drain is too great.
Not many workloads are RAM bandwidth limited. Power and latency are much more common bottlenecks, and HBM loses on both of those.
Multicore workloads do tend to hit RAM bandwidth limits before they hit power constraints. If you do the math, running at max frequency and core utilization would usually imply you could only access a byte or so per core clock cycle. Perhaps a mere handful of bytes for the highest-performance systems with in-package RAM.
Isn’t memory bandwidth super relevant for AI?
It is like the most important performance figure. When I use an LLM that mostly fits on my GPU, the GPU will run at about 30% of its maximum power consumption - probably because the memory can't feed the ALUs fast enough. Similarly for the part that runs on the CPU, the CPU cores will show 100% utilization but not consume as much power as they usually do under full load. The GUI will also be choppier than usual under full load (noticeable, but not too annoying) presumably because pixel pushing also needs some nontrivial memory bandwidth which is hard to get.
Yes and so we use HBM for AI (among other things), but that's an exception. For things like games or displaying webpages, its not very important and we generally don't put HBM into things for that.