Hacker News

> monitoring

Most monitoring solutions support Postgres and don't actually care where your DB is hosted. Of course this only applies if someone was actually looking at the metrics to begin with.

> backups

Plenty of options to choose from depending on your recovery time objective. From scheduled pg_dumps to WAL shipping to disk snapshots and a combination of them at any schedule you desire. Just ship them to your favorite blob storage provider and call it a day.

> scaling

That's the main reason I favor bare-metal infrastructure. There is no way anything on the cloud (at a price you can afford) can rival the performance of even a mid-range server that scaling is effectively never an issue; if you're outgrowing that, the conversation we're having is not about getting a big DB but using multiple DBs and sharding at the application layer.

> failover still needs to happen

Yes, get another server and use Patroni/etc. Or just accept the occasional downtime and up to 15 mins of data loss if the machine never comes back up. You'd be surprised how many businesses are perfectly fine with this. Case in point: two major clouds had hour-long downtimes recently and everyone basically forgot about it a week later.

> If you bring that expertise in house

Infrastructure should not require continuous upkeep/repair. You wouldn't buy a car that requires you to have a full-time mechanic in the passenger seat at all times. If your infrastructure requires this, you should ask for a refund and buy from someone who sells more reliable infra.

A server will run forever once set up unless hardware fails (and some hardware can be redundant with spares provisioned ahead of time to automatically take over and delay maintenance operations). You should spend a couple hours a month max on routine maintenance which can be outsourced and still beats the cloud price.

I think you're underestimating the amount of tech that is essentially nix machines all around you that somehow just... work* despite having zero upkeep or maintenance. Modern hardware is surprisingly reliable and most outages are caused by operator error when people are (potentially unnecessarily) messing with stuff rather than the hardware failing.