There is this weird thing that happens with hyperscale - the combination of highly central decision-making, extreme interconnection / interdependence of parts, and the attractiveness of lots of money all conspire to create a system pulled by unstable attractors to a fracturing point (slowed / mitigated at least a little by the inertia of such a large ship).

Are smaller scale services more reliable? I think that's too simple a question to be relevant. Sometimes yes, sometimes no, but we know one thing for sure - when smaller services go down the impact radius is contained. When a corrupt MBA who wants to pump short term metrics for a bonus gains power, the damage they can do is similarly contained. All risk factors are boxed in like this. With a hyperscale business, things are capable of going much more wrong for many more people, and the recursive nature of vertical+horizontal integration causes a calamity engine that can be hard to correct.

Take the financial sector in 08. Huge monoliths that had integrated every kind of financial service with every other kind of financial service. Few points of failure, every failure mode exposed to every other failure mode.

There's a reason asymmetric warfare is hard for both parties - cellular networks of small units that can act independently are extremely fault tolerant and robust against changing conditions. Giants, when they fall, do so in spectacular fashion.

Have you considered that a widespread outage is a feature, not a bug?

If AWS goes down, no one will blame you for your web store being down as pretty much every other online service will be seeing major disruptions.

But when your super small provider goes down, it's now your problem and you better have some answers ready for your manager. And you'll still be affected by the AWS outage anyways as you probably rely on an API that runs on their cloud!

> Have you considered that a widespread outage is a feature

It's a "feature" right up there with planned obsolescence and garbage culture (the culture of throw-away).

The real problem is not having a fail-over provider. Modern software is so abstracted (tens, hundreds, even thousands of layers), and yet we still make the mistake of depending on one, two layers to make things "go".

When your one small provider goes down, no problem, switch over to your other provider. Then laugh at the people who are experiencing AWS downtime...

That just leads to an upstream single point of failure.

Very few online services are so essential that they require a fail-over plan for an AWS outage, so this is just plain over-engineering.

> Then laugh at the people who are experiencing AWS downtime...

Let's not stroke our egos too much here, mkay?

Depends on your customers understanding that. We had a gym with 'smart' pilates machines that went down. Hard to explain to them the cloud is involved