It's not even ECC price/availability that bothers me so much, it's that getting CPUs and motherboards that support ECC is non-trivial outside of the server space. The whole consumer class ecosystem is kind of shitty. At least AMD allows consumer class CPUs to kinda sorta use ECC, unlike Intel's approach where only the prosumer/workstation stuff gets ECC.
288-pin ECC is, I believe, available on any X670E/X870E platform boards so long as the motherboard builder hasn’t expressly interfered with it (and probably other chipsets as well?). Windows 10+ reports it as full ECC (multi-bit / 72-bits). AMD pushed that enable in an AGESA three or four years ago iirc. The CAS latency for ECC is about double what gaming RAM offers, but in practice other more costly factors tend to limit performance first. Any motherboard released before the AGESA update would be more difficult to predict, but that’s baseline uncertainty for PCs so no surprises there.
>The CAS latency for ECC is about double what gaming RAM offers
Ironically, overclocking ECC memory is much easier than overclocking non-ECC DIMMs, because you know exactly at which point you start encountering instability and need to dial back, instead of relying on.. application crashes and BSOD's to know that you're running way too optimistic clocks/timings.
Meanwhile I overclocked 'low clock / loose timing' ECC DIMMs on Ryzen 7 platform with no issues at all – kept increasing clocks and lowering timings until ECC started reporting errors, then dialed it back a couple notches, and now it is not just stable, but I also have exact reporting of it being stable.
Yeah! A stick of 5600 can generally reach 6000 with geardown off and that’s as far as I’ve seen cause to dial it. But certain parameters that are popular to lower for latency reduction can be, how would I put it, slightly less flexible — tREFI comes to mind as one that nearly any lowering of (on the enterprise sticks I’m using anyways) tends to cause DFE/MBIST training failures no matter what, even with direct airflow, before it ever boots far enough for memtest to expose ECC errors.
(For those out there following along with PCs, if you aren’t tuning with MBIST maxed out in your BIOS, you might want to revisit that.)
Same experience here. I have a feeling ECC in gaming would soon become a thing if it wasn't for the pricing crisis.
I've been honestly amazed people actually buy stuff that's not "workstation" gear given IME how much more reliably and consistently it works, but I guess even a generation or two used can be expensive.
Very few applications scale with cores. For the vast majority of people single core performance is all they care about, it's also cheaper. They don't need or want workstation gear.
I have come to doubt that single core or CPU performance in general, other than maybe specialty applications like CAD and some games, is all that noticeable for most computer users in the last decade. I can take relatively pedestrian users like my parents or my wife and put them in front of a decade old high end Haswell system or a brand new mega-$$$ threadripper/epyc and for almost all intents and purposes they don't notice a different. What they do notice is when things die. I'm sure consumer hardware might be OK for 2-3 years (maybe), but like for my parents, they're happier to keep using the same computer, and honestly the same Dell Precision system I gave them almost 10 years ago works great today, and I have a suspicion that the hardware, outside of maybe the SSD finally wearing out, will probably work right a decade from now too.
> Very few applications scale with cores
You mean like compilers and test suites ? Very few professional workloads don't parallelize well these days.
Compilers and test suits do scale (at least for C/C++ and Rust, which is what I work with). But I think the parent comment referred to consumer applications: games, word processing, light browsing, ...
(Though games these days scale better than they used to, but only up to a to a point.)
I find that most tools I write for my own use can be made to scale with cores, or run so fast that the overhead of starting threads is longer than the program runtime. But I write that in Rust which makes parallelism easy. If I wrote that code in C++ I would probably not bother with trying to parallelize.
But those tools aren't really compute bound anyway - you're not buying a workstation to do them, you're getting a consumer laptop or a tablet.
And that consumer device should have ECC! That's the whole discussion here.
It's confusing because a few comments up is "for the vast majority of people single core performance is all they care about, it's also cheaper" which is unrelated to ECC.
I think it's coherent -- it's an argument for why most people don't want to buy Workstation class products just to get ECC. (Prices scale with core count. Not linearly, but still.)
Why ? If your device is a thin client for web services/gaming the risk of bitflips/bad ram is a minor annoyance.
I disagree with your handwaving bitflips away as a minor annoyance. Consumers don't love software crashing, even if they don't have any data they care about.
Imagine ECC was free -- would you rather have free ECC and no bitflips, or no ECC and bitflips? It's hard to imagine choosing bitflips.
ECC would save an unbelievable amount of labor. A shocking number of people have jobs looking at various logs.
Test suites often don't scale, actually. Unit tests usually run single-threaded by default, and also relatively often have side effects on the system that mean they're unsafe to run in parallel. (Sure, sure, you could definitely argue the latter thing is a skill issue.)
In theory, do you need a single machine for any of that, or would it be cheaper to use a low-availability cloud cluster? Tests are totally independent, and builds probably parallel enough.
Only a small percentage of computer users are programmers.
There were several years where used cheese grater Mac Pros could be bought and upgraded for very cheap, and were still not too outdated. I only replaced my MacPro4,1 when the M1 mini came out, mainly cause of wattage.
I've had zero issues with AMD's consumer tier of non-WX Threadripper and Ryzen models, FWIW.
overblown? billions of users use consumer tier hardware just fine. i have servers at home with years of uptime without any ECC memory
But how much bit rot? You’ll never know.
If I don't know about it, then how does it affect me / why should I care? My home server does what it is supposed to do and has done so for a decade. If bit rot /bit flips in memory does not affect my day-to-day life I much prefer cheaper hardware.
I do hope the nuclear powerplant next door uses more fault tolerant hardware, though.
Eventually you might notice the pictures or other documents you were saving on your home server have artifacts, or no longer open. This is undesireable for most people using computer storage.
> I much prefer cheaper hardware.
The cost savings are modest; order of magnitude 12% for the DIMMs, and less elsewhere. Computers are already extremely cheap commodities.
12% for the DIMMs only, but with Intel you need Xeon and its accompanying motherboard for it. Someone said AMD "kinda" lets you do ECC on consumer hardware, not sure what the caveats are besides just being unbuffered.
Assuming that's more due to intentional market segmentation than actual cost, yeah I would pay 12% more for ECC. But I'm with the other guy on not valuing it a ton. I have backups which are needed regardless of bitrot, and even if those don't help, losing a photo isn't a huge deal for me.
> Someone said AMD "kinda" lets you do ECC on consumer hardware, not sure what the caveats are besides just being unbuffered.
That was me. It isn't "officially" supported by AMD, but it should work. You can enable EDAC monitoring in Linux and observe detected correction events happening.
> Assuming that's more due to intentional market segmentation than actual cost
That's the argument, yeah.
I'm more concerned how the Mac filesystems don't have payload checksums.
I hate my workstation desktop I assembled 15 years. It just doesn't break! I have no excuses to buy a new one (except for video card).