It could be massively improved with a special CPU instruction for racing dram reads. That might make it actually useful for real applications. As it is, the threading model she used here would make it incredibly difficult to use this in a real program.

There’s no point racing DRAM reads explicitly. Refreshes are infrequent and the penalty is like 5x on an already fast operation, 1% of the time.

What’s better is to “race” against cache, which is 100x faster than DRAM. CPUs already of do this for independent loads via out-of-order execution. While one load is stalled waiting for DRAM, another can hit the cache and do some compute in parallel. It’s all already handled at the microarchitectural level.

There are already systems that do this in hardware. Any system that has memory mirroring RAS features can do this, notably IBM zEnterprise hardware, you know, the company that this video promoter claims to be one-upping.

I don't think memory mirring features available today allow you to race two DRAM accesses and use the result that returns earlier?

The memory controller sends the read to the DIMM that is not refreshing. It is invisible to software, except for the side-effect of having better performance.

Mirroring is more of a reliability feature though, no? From my understanding it’s like RAID where you keep multiple copies plus parity so uncorrectable errors aren’t catastrophic. Makes sense for mainframes which need to survive hardware failures.

Refresh avoidance is a tangential thing the memory controller happens to be able to do in a scheme like that, but you’d really have to be looking at it in a vacuum to bill it as a benefit.

Like I said, it’s all about cache. You’re not going to DRAM if you actually care about performance fluctuations at the scale of refresh stalls.

Clearly, hitting a cache would be the better outcome. The technique suggested here could only apply to unavoidably cold reads, some kind of table that's massive and randomly accessed. Assume it exists, for whatever reason. To answer your question, refresh avoidance is an advertised benefit of hardware mirroring. Current IBM techno-advertising that you can Google yourself says this:

"IBM z17 implements an enhanced redundant array of independent memory (RAIM) design with the following features: ... Staggered memory refresh: Uses RAIM to mask memory refresh latency."

I can google, thanks. My point is that nobody is buying mainframes with redundant memory to avoid refresh stalls. It’s a mostly irrelevant freebie on hardware you bought for fault tolerance.