Every so often when I'm doing refactoring work and my list of worries has decreased to the point I can start thinking of new things to worry about, I worry about how as we reduce the accidental complexity of code and condense the critical bytes of the working memory tighter and tighter, how we are leaning very hard on very few bytes and hoping none of them ever bitflip.
I wonder sometimes if we shouldn't be doing like NASA does and triple-storing values and comparing the calculations to see if they get the same results.
Might be worth doing the kind of "manual ECC" you're describing for a small amount of high-importance data (e.g., the top few levels of a DB's B+ tree stored in memory), but I suspect the biggest win is just to use as little memory as possible, since the probability of being affected by memory corruption is roughly proportional to the amount you use.
Precautionary Principle is always about blast radius times probability. Condensing the state reduces the odds that the bit flip will be in your critical memory but increases the damage when it does. That tends to be a proportional amount so if it’s not a lateral move it’s at least a serpentine one.
> but increases the damage when it does.
For this to be true, I think you would have to assume an "additive" model where each time corrupt memory is accessed it does some small amount of additional "damage". But for memory holding CPU instructions, I think it's more likely that the first time a corrupt byte is read, the program crashes.