Not saying those things don’t happen, but having worked with on-prem for 2 years, and having ran ancient (13 years old currently) servers in my homelab for 5 years, I’ve never seen them. Bad CPU, bad RAM, yes - and modern servers are extremely good at detecting these and alerting you.
In my homelab, in 5 years of running the aforementioned servers (3x Dell R620, and some various Supermicros) 24/7/365, the only thing I had fail was a power supply. Turns out they’re redundant, so I ordered another one, and the spare kept the server up in the meantime. If I was running these for a business, I’d keep hot spares around.
I'm glad it's working for you! It's worked for me in the past as well, but I've also felt the pain. As I mentioned before, it's often the case that things will work, but in some ways, you need to have an increased appetite for risk.
I suppose it depends on scale and requirements. A homelab isn't very relevant IMHO, because the sample size is small and the load is negligible. Push the hardware 24/7 and the cracks are more likely to appear.
A nice-to-have service can suffer some downtime, but if you're running a non-trivial/sizable business or have regulation requirements, downtime can be rough. Keeping spare compute servers is normal, but you'll be hard pressed to convince finance to spend big money on core services (db, storage, networking) that are sitting idle as backups.
Say you convinced finance to spend
Agreed that homelab load is generally small compared to a company’s (though an initial Plex cataloging run will happily max out as many cores as you give it for days).
In the professional environment I mentioned, I think we had somewhere close to 500 physical servers across 3 DCs. They were all Dell Blades, and nothing was virtualized. I initially thought that latter bit was silly, but then I saw that no, they’d pretty well matched compute to load. If needs grew, we’d get another Blade racked.
We could not tolerate unplanned downtime (or rather, our customers couldn’t), but we did have a weekly 3-hour maintenance window, which was SO NICE. It was only a partial outage for customers, and even then, usually only a subset of them at a time. Man, that makes things easier, though.
They were also hybrid AWS, and while I was there, we spun up an entirely new “DC” in a region we didn’t have a physical one. More or less lift-and-shift, except for managed Kafka, and then later EKS.