Yeah from memory on-prem was always cheaper, it just removed a lot of logistic obstacles and made everything convenient under one bill.
IIRC the wisdom of the time cloud started becoming popular was to always be on-prem and use cloud to scale up when demand spiked. But over time temporarily scaling up became permanent, and devs became reliant on instantly spawning new machines for things other than spikes in demand and now everyone defaults to cloud and treats it as the baseline. In the process we lost the grounding needed to assess the real cost of things and predictably the cost difference between cloud and on-prem has only widened.
> IIRC the wisdom of the time cloud started becoming popular was to always be on-prem and use cloud to scale up when demand spiked.
I've heard that before but was never able to make sense of it. Overflowing into the cloud seems like a nightmare to manage, wouldn't overbuilding on-prem be cheaper than paying your infra team to straddle two environments?
I've setup on-prem environments that would automatically and transparently spill workload to the cloud when we ran out of on-prem capacity. The prerequisite is that your operations people need to have strong automation skills, there can't be any human button-pushers in the loop, but it really isn't that much work nor does it require unusual technical skill if you have a competent operations environment. Once everything is wired up, it mostly just works. Even the devs often don't know whether their workloads are running on-prem or in the cloud.
An under-rated aspect of this is that it provides detailed data for planning the expansion of the on-prem environment, including changes in how the hardware is being used.
My experience is that this is a good model. On-prem is about 1/3 the cost of the cloud, fully burdened, in my experience but being able to transparently spill to the cloud mitigates the capacity and supply chain risks that come with on-prem.
It is effectively a cheap insurance policy.
As someone with experience with a company that did hybrid, I’ll say: it only makes sense if your infra team deeply understands computers.
The end state is “just some IaC,” wherein it doesn’t really matter to anyone where the application lives, but all of the underlying difficulties in getting to that state necessitate that your team actually, no-shit knows how distributed systems work. They’re going to be doing a lot of networking configuration, for one, and that’s a whole speciality.