self-hosting isn't "golden", if you are serious about the reliability of complex systems, you can't afford to have your own outages impede your own engineers from fixing them.
if you seriously have no external low dep fallback, please at least document this fact now for the Big Postmortem.
The engineers can walk up to the system and do whatever they need to fix them. At least, that's how we self host in the office. If your organisation hosts it far away then yeah, it's not self hosted but remote hosted
> The engineers can walk up to the system and do whatever they need to fix them.
Including fabricating new RAM?
Including falling back to third-party hosting when relevant. One doesn't exclude the other
My experience with self hosting has been that, at least when you keep the services independent, downtime is not more common than in hosted environments, and you always know what's going on. Customising solutions, or workarounds in case of trouble, is a benefit you don't get when the service provider is significantly bigger than you are. It has pros and cons and also depends on the product (e.g. email delivery is harder than Mattermost message delivery, or if you need a certain service only once a year or so) but if you have the personell capacity and a continuous need, I find hosting things oneself to be the best solution in general
Including fallback to your laptop if nothing else works. I saved a demo once by just running the whole thing from my computer when the Kubernetes guys couldn't figure out why the deployed version was 403'ing. Just had to poke the touchpad every so often so it didn't go to sleep.
> Just had to poke the touchpad every so often so it didn't go to sleep
Unwarranted tip: next time, if you use macOS, just open the terminal and run `caffeinate -imdsu`.
I assume Linux/Windows have something similar built-in (and if not built-in, something that's easily available). For Windows, I know that PowerToys suite of nifty tools (officially provided by Microsoft) has Awake util, but that's just one of many similar options.
You can just turn of automatic sleep/screen off in Windows native power settings.
If you self host, you must keep the spares, atleast for an enterprise environment.
The key thing that AWS provides is the capacity for infinite redundancy. Everyone that is down because us-east-1 is down didn't learn the lesson of redundancy.
Some organizations’ leadership takes one look at the cost of redundancy and backs away. Paying for redundant resources most organizations can stomach. The network traffic charges are what push many over the edge of “do not buy”.
The cost of re-designing and re-implementing applications to synchronize data shipping to remote regions and only spinning up remote region resources as needed is even larger for these organizations.
And this is how we end up with these massive cloud footprints not much different than running fleets of VM’s. Just about the most expensive way to use the cloud hyperscalers.
Most non-tech industry organizations cannot face the brutal reality that properly, really leveraging hyperscalers involves a period of time often counted in decades for Fortune-scale footprints where they’re spending 3-5 times on selected areas more than peers doing those areas in the old ways to migrate to mostly spot instance-resident, scale-to-zero elastic, containerized services with excellent developer and operational troubleshooting ergonomics.
Active-active RDBMS - which is really the only feasible way to do HA, unless you can tolerate losing consistency (or the latency hit of running a multi-region PC/EC system) - is significantly more difficult to reason about, and to manage.
Except Google Spanner, I’m told, but AWS doesn’t have an answer for that yet AFAIK.
They do now: https://aws.amazon.com/rds/aurora/dsql/