>> It replicates data across multiple, independent DRAM channels with uncorrelated refresh schedules
This is the sort of thing which was done before in a world where there was NUMA, but that is easy. Just task-set and mbind your way around it to keep your copies in both places.
The crazy part of what she's done is how to determine that the two copies don't get get hit by refresh cycles at the same time.
Particularly by experimenting on something proprietary like Graviton.
She determines that by having three copies. Or four. Or eight.
Tis just probabilities and unlikelihood of hitting a refresh cycle across that many memory channels all at once.
Right, but the impressive part is finding addresses that are actually on different memory channels.
Surprising to me that two memory channels are separated by as little as 256 bytes. The short distance makes it easier to find, surely?
Access optimization or interleaving at a lower level than linearly mapping DIMMs and channels. x86 cache lane size is 64 bytes, so it must be a multiple. Probably 64*2^n bytes.
"This is the sort of thing which was done before in a world where there was NUMA"
You sound like NUMA was dead, is this a bit of hyperbole or would really say there is no NUMA anymore. Honest question because I am out if touch.
EPYC chips have multiple levels of NUMA - one across CCDs on the one chip, and another between chips in different motherboard sockets. As a user under Linux you can treat it as if it was simple SMP, but you’ll get quite a bit less performance.
Home PCs don’t do NUMA as much anymore because of the number of cores and threads you can get on one core complex. The technology certainly still exists and is still relevant.