> I wonder how many out there seriously think we could ever completely rid ourselves of the CPU.
How do you class systems like the PS5 that have an APU plugged into GDDR instead of regular RAM? The primary remaining issue is the limited memory capacity.
I wonder if we might see a system with GPU class HBM on the package in lieu of VRAM coupled with regular RAM on the board for the CPU portion?
I don’t think the remaining issue is memory capacity. CPUs are designed to handle nonlinear memory access and that is how all modern software targeting a CPU is written. GPUs are designed for linear memory access. These are fundamentally different access patterns the optimal solution is to have 2 distinct processing units
people say this a lot, but with little technical justification.
gpus have had cache for a long time. cpus have had simd for a long time.
it's not even true that the cpu memory interface is somehow optimized for latency - it's got bursts, for instance, a large non-sequential and out-of-page latency, and has gotten wider over time.
mostly people are just comparing the wrong things. if you want to compare a mid-hi discrete gpu with a cpu, you can't use a desktop cpu. instead use a ~100-core server chip that also has 12x64b memory interface. similar chip area, power dissipation, cost.
not the same, of course, but recognizably similar.
none of the fundamental techniques or architecture differ. just that cpus normally try to optimize for legacy code, but gpus have never done much ISA-level back-compatibility.
GDDR has high bandwidth but limited capacity. Regular RAM is the opposite, leaving typical APUs memory bandwidth starved.
Both types of processor perform much better with linear access. Even for data in the CPU cache you get a noticable speedup.
The primary difference is that GPUs want large contiguous blocks of "threads" to do the same thing (because in reality they aren't actually independent threads).
If anything, GPUs combine large private per-compute unit private address spaces and a separate shared/global memory, which doesn't mesh very well with linear memory access, just high locality. You can kinda get to the same arrangement on CPU by pushing NUMA (Non-Uniform Memory: only the "global" memory is truly Unified on a GPU!) to the extreme, but that's quite uncommon. "Compute-in-memory" is a related idea that kind of points to the same constraint: you want to maximize spatial locality these days, because moving data in bulk is an expensive operation that burns power.