Even the high profile datacenters I had to deal with in Frankfurt had the same issues. There were regular maintenance tests where they made sure the generators were working properly... I can imagine this is more of a pray and sweat task than anything that's in your hands. I have no clue why this is the status quo though.
The phone utility were I live has deisel generators that kick on whenever the power goes out in order to keep the copper phone lines operational. These generators always work, or at least one of the four they have in each office does.
The datacenter I was in for awhile had the big gens, and with similar "phone utility" setups - they would cut to the backup gens once a month and run for longer than the UPS could hold the facility (if they detected an issue, they'd switch back to utility power).
They also had redundant gensets (2x the whole facility, 4x 'important stuff' - you could get a bit of a discount by being willing to be shut off in a huge emergency where gens were dying/running out of fuel).
I wonder why we don't put battery backups in each server/switch/etc. Basically, just be a laptop in each 1U rack space instead of a desktop.
Sure, you can't have much runtime, but if you got like 15 minutes for each device and it always worked, you could smooth over a lot of generator problems when something chews through the building's main grid connection.
It’s pretty common to have a rack of batteries that might serve an isle. The idea of these is that you’d have enough juice for the generator to kick in. You couldn’t run these for longer periods, and even if you could, you’d still have the AC unpowered, which would quickly lead to machines overheating and crashing. Plus the building access controls need powering too. As does lighting, and a whole host of other critical systems. But the AC alone is a far more significant problem than powering the racks. (I’ve worked in places when the AC has failed, it’s not fun. You’d be amazed how much heat those systems can kick out).
In my experience, you have building UPS on one MDU and General supply on the other. Building UPS will power everything until generators spin up, and if the UPS itself dies then you're still powered from general supply
Did lose one building about 20 years ago when the generator didn't start
But then I assume that any services I have which are marked as three-nines or more have to be provided from multiple buildings to avoid that type of single point of failure. The services that need five-nines also take into account loss of a major city, beyond that there's significant disruption though -- especially with internet provision, as it's unclear what internet would be left in a more widespread loss of infrastructure.
One challenge is that the power usage of a server is order(s) of magnitude greater than that of a laptop. This means the cost to do what you describe is significant, hence that has to be taken into account when trying to build a cluster that is competitive...
Yeah, I agree with that. I think that power savings are a big priority for datacenters these days, so perhaps as more efficient chips go into production, the feasibility of "self-contained" servers increases. I could serve a lot of websites from my phone, and I've never had to fire up a diesel generator to have 100% uptime on it. (But, the network infrastructure uses more power than my phone itself. ONT + router is > 20W! The efficiency has to be everywhere for this to work.)
They most likely lie about the power outage. Azure does this all the time. I am so fking tired of these services. Even minor data centers in Europe have backup plans for power.
Cost of that likely is ginormous compared to their SLA obligations.