Those are all much smaller. Smaller providers have a much stronger incentive to be reliable, as they will lose customers if they are not. In a corporate settings management will say "this would not have happened if you had gone with AWS". its the current version of "no one ever got fired for buying IBM" (we had MS and others in between).
Hetzner provides a much simpler set of services than AWS. Less complexity to go wrong.
A lot of people want the brand recognition too. Its also become the standard way of doing things and is part of the business culture. I have sometimes been told its unprofessional or looks bad to run things yourself instead of using a managed service.
There is this weird thing that happens with hyperscale - the combination of highly central decision-making, extreme interconnection / interdependence of parts, and the attractiveness of lots of money all conspire to create a system pulled by unstable attractors to a fracturing point (slowed / mitigated at least a little by the inertia of such a large ship).
Are smaller scale services more reliable? I think that's too simple a question to be relevant. Sometimes yes, sometimes no, but we know one thing for sure - when smaller services go down the impact radius is contained. When a corrupt MBA who wants to pump short term metrics for a bonus gains power, the damage they can do is similarly contained. All risk factors are boxed in like this. With a hyperscale business, things are capable of going much more wrong for many more people, and the recursive nature of vertical+horizontal integration causes a calamity engine that can be hard to correct.
Take the financial sector in 08. Huge monoliths that had integrated every kind of financial service with every other kind of financial service. Few points of failure, every failure mode exposed to every other failure mode.
There's a reason asymmetric warfare is hard for both parties - cellular networks of small units that can act independently are extremely fault tolerant and robust against changing conditions. Giants, when they fall, do so in spectacular fashion.
Have you considered that a widespread outage is a feature, not a bug?
If AWS goes down, no one will blame you for your web store being down as pretty much every other online service will be seeing major disruptions.
But when your super small provider goes down, it's now your problem and you better have some answers ready for your manager. And you'll still be affected by the AWS outage anyways as you probably rely on an API that runs on their cloud!
> Have you considered that a widespread outage is a feature
It's a "feature" right up there with planned obsolescence and garbage culture (the culture of throw-away).
The real problem is not having a fail-over provider. Modern software is so abstracted (tens, hundreds, even thousands of layers), and yet we still make the mistake of depending on one, two layers to make things "go".
When your one small provider goes down, no problem, switch over to your other provider. Then laugh at the people who are experiencing AWS downtime...
That just leads to an upstream single point of failure.
Very few online services are so essential that they require a fail-over plan for an AWS outage, so this is just plain over-engineering.
> Then laugh at the people who are experiencing AWS downtime...
Let's not stroke our egos too much here, mkay?
Depends on your customers understanding that. We had a gym with 'smart' pilates machines that went down. Hard to explain to them the cloud is involved
> Smaller providers have a much stronger incentive to be reliable, as they will lose customers if they are not.
Hard disagree. A smaller provider will think twice about whether they use a Tier 1 data center versus a Tier IV data center because the cost difference is substantial and in many cases prohibitively expensive.
This. There's a fundamental logic error here. You simply don't hear about downtimes at smaller providers that often because it doesn't affect a significant portion of the internet like it does e.g. for AWS. But that doesn't mean they are more stable in general.
yeah, I'd like to see hard data on uptimes / reliability between these 2 services before declaring that big = bad and small = good.
FlyIO (and Digital Ocean) had horrible up-time when they first got started. In the last 6-12 months, FlyIO been much better. But they would go down all the time or have unexpected CI bugs/changes.
Digital Ocean accidentally hard deleted user's object stores before their IPO.
> A lot of people want the brand recognition too.
Not to mention the familiarity of the company, its services and expectations. You can hire people with experience with AWS, Azure or GCP, but the more niche you go, the higher the possibility that some people you hire might not know how to work with those systems and their nuances, which is fine they can learn as they work, but that adds to ramp up time and could lead to inadvertent mistakes happening.
This could also be an anti-pattern for hiring - getting people with Amazing Web Service (tm) certification and missing out on candidates with a solid understanding of the foundational principles these services are built on
I agree, though the industry does this all the time by hiring someone with a degree vs someone who built key infrastructure and has no degree, solely because they have a degree. Remember, the creator of brew couldn't get past a Google interview because they asked him to hand craft some algorithm, I probably would have not done well with those either. Does that make him or me worse developers? Doubtful. Does it mean Google missed out on hiring someone who loves his craft? Yes.
I think that is often the perception, but is usually mistaken.
Smaller providers tend to have simpler systems so it only adds to ramp up time if you hire someone who only knows AWS or whatever. Simpler also means fewer mistakes.
If you stick to a simple set of services (e.g. VPS or containers + object storage) there are very few service specific nuances.
They also have the risk factor of leaving the market entirely as well, and you having to scramble to pick up the pieces.
I think cloudflare has billions worth of incentives to be reliable however they can slip up, it happens and that's why centralization is bad.
That is true.
However, I would say that the effect of this outage on customer retention will be (relatively) smaller than it would be for a smaller CDN.
Maybe? Maybe not? It depends on the nature of the outage and how motivated their customers are to switch over to a new service.
The good news is that we're just living in a perfect natural experiment:
Cloudflare just caused a massive internet outage costing millions of dollars worldwide, in part due to a very sloppy mistake that definitely ought to have been prevented (using Rust's “unwrap” in production ). Let's see how many customers they lose because of that and we'll see how big are their incentives. (If you look at the evolution of their share value, it doesn't look like the incident terrified their shareholders at least…)
That experiment already happened last year with Crowdstrike. Nothing detrimental happened. Their revenue actually increased and stock went up
>I have sometimes been told its unprofessional or looks bad to run things yourself instead of using a managed service.
That's an incredibly bad take lol.
There are times where "The Cloud" makes sense, sure. But in my experience the majority of the time companies over-use the cloud. On Prem is GOOD. It's cheaper, arguably more secure if you configure it right (a challenge, I know, but hear me out) and gives you data sovereignty.
I don't quite think companies realize how bad it would be if EG AWS was hacked.
Any Data you have on the cloud is no longer your data. Not really. It's Amazon, Microsoft, Apple, whoevers.
> I don't quite think companies realize how bad it would be if EG AWS was hacked.
I don't think they'd care. Companies only care about one thing: stock price. Everything rolls up into that. If AWS got hacked and said company was affected by it, it wouldn't be a big deal because they'd be one of many and they'd be lost in the crowd. Any hit to their stock/profits would be minimal and easily forgotten about.
Now, if they were on prem or hosted with Bob's Cloud and got hacked? Different story altogether.
> Companies only care about one thing: stock price.
Its rarely affected in any case. Take a look at the Crowdstrike price chart (or revenue or profits). I think most people (including investors) just take it for granted that systems are unreliable and regard it as something you live with.
I think that's more of a indicator that it hasn't effected their business. They lost nearly 1/5 of their stock price after that incident (obviously not accounting for other factors; I'm not a stock analyst). Investors thought they'd lose customers and reacted in obvious fashion.
But it's since been restored. According to the news, they lost very little customers over the incident. That is why their stock came back. If they continued having problems, I doubt it would have been so rosy. So yes, to your point, a blip here or there happens.
Configuring something on premises to match the capabilities of AWS or Azure or CloudFlare is very, very difficult and involves a lot of local money and expertise that often isn’t available at any affordable price.
>Configuring something on premises to match the capabilities of AWS or Azure or CloudFlare is very, very difficult and involves a lot of local money and expertise that often isn’t available at any affordable price.
A large number of cloud customers dont need the complexity that the cloud can offer. Like, yes, its hard to 1:1 feature replicate the cloud. But so many people just have some VMs and some routes.
I've actually tried hetzner on and off with 1 server for the past 2 years and keep running into downtime every few months.
First I used an ex101 with an i9-13900. Within a week it just froze. It could not be reset remotely. Nothing in kern.log. Support offered no solution but a hard reboot. No mention of what might be wrong other than user error.
A few months later, one of the drives just disconnects from raid by itself. It took support 1 hour to respond and they said they found no issue so it must be my fault.
Then I changed to a ryzen based server and it also mysteriously had problems like this. Again the support blamed the user.
It was only after I cancelled the server and several months later that I see this so I know it isn't just me.
https://docs.hetzner.com/robot/dedicated-server/general-info...
> Smaller providers have a much stronger incentive to be reliable, as they will lose customers if they are not.
I disagree because conversely, outages for larger providers cause millions or maybe even billions of dollars in losses for its customers. They might be more "stuck" in their current providers' proprietary schemes, but these kinds of losses will cause them to move away, or at least diversify cloud providers. In turn, this will cause income losses to the cloud provider.
> Less complexity to go wrong.
This sounds like a good thing.
It is, in itself.
It does mean that you get fewer services, you have to do more sysadmin internally or use other providers for those which a lot of people are very reluctant to do.
I bet most people don't even need the extra features.
When forced to use AWS I only use the extra features I am specifically told to or that are already in use in order to make the system less tied to AWS and easier for me to manage (I am not an AWS specialist so its easier for me to just run stuff like I would on any server or VPS). I particularly dislike RDS (of things I have used). I like Lightsail because its reasonably priced and very like just getting a VPS.
S3 is something of an exception, but it does not tie you down (everyone provides block storage now, and you can use S3 even if everything else is somewhere else) for me if storing lots of large files that are not accessed very much (so egress fees are low).
Looking forward to the Show HN: I built a web site that uses all of AWS services.
That would be an expensive Show HN.
And they sell when get big but can't afford to be.