Halfway through this great video and I have two questions:
1) Can we take this library and turn it into a a generic driver or something that applies the technique to all software (kernel and userspace) running on the system? i.e. If I want to halve my effective memory in order to completely eliminate the tail latency problem, without having to rewrite legacy software to implement this invention.
2) What model miniature smoke machine is that? I instruct volunteer firefighters and occasionally do scale model demos to teach ventilation concepts. Some research years back led me to the "Tiny FX" fogger which works great, but it's expensive and this thing looks even more convenient.
1. not that I can think of, due to the core split. It really has to be independent cores racing independent loads. anything clever you could do with kernel modules, page-table-land, or dynamically reacting via PMU counters would likely cost microseconds...far larger than the 10s-100s of nanoseconds you gain.
what I wished I had during this project is a hypothetical hedged_load ISA instruction. Issue two requests to two memory controllers and drop the loser. That would let the strategy work on a single thread! Or, even better, integrating the behavior into the memory controller itself, which would be transparent to all software without recompilation. But, you’d have to convince Intel/AMD/someone else :)
2. It’s called a “smokeninja”. Fairly popular in product photography circles, it’s quite fun!
Or, even better, integrating the behavior into the memory controller itself, which would be transparent to all software without recompilation.
Yeah it would be neat to just flip a BIOS switch and put your memory into "hedge" mode. Maybe one day we'll have an open source hardware stack where tinkerers can directly fiddle with ideas like this. In the meantime, thanks for your extensive work proving out the concept and sharing it with the world!
If you're able to do it at the memory controller level, would it be as simple as making two controllers always operate in lock-step, so their refresh cycles are guaranteed to be offset 50% from one another?
Given that the controller can already defer refresh cycles, and the logic to determine when that happens sounds fairly complex, I suspect that might already be in CPU microcode.
...which raises the tantalizing possibility that this lockstep-mirrored behavior might also be doable in microcode.
Is there a reason you can think of why AMD, Intel etc. would not want to do this?
Really enjoyed the video and feel that I (not being in the IT industry) better understand CPUs und and RAM now.
I can not think of any reason they would not want to do it.
However, I do seem at least 2 downsides to this method.
Number one it is at least 2x the memory. That has for a decently long time been a large cost of a computer. But I could see some people saying 'whatever buy 8x'.
The second is data coherency. In a read only env this would work very nicely. In a write env this would be 2x the writes and you are going to have to wait for them to all work or somehow mark them as not ready on the next read group. Now it would be OK if the read of that page was some period of time after the write. But a different place where things could stall out.
Really liked her vid. She explained it very nicely. She exudes that sense of joy I used to have about this field.
> halve my effective memory in order to completely eliminate the tail latency problem,
Wouldn't you have a tail latency problem on the write side though if you just blindly apply it every where? As in unless all the replicas are done writing you can't proceed.
Brio 33884. It has a tiny ultrasonic humidifier in there.