> IIRC the wisdom of the time cloud started becoming popular was to always be on-prem and use cloud to scale up when demand spiked.
I've heard that before but was never able to make sense of it. Overflowing into the cloud seems like a nightmare to manage, wouldn't overbuilding on-prem be cheaper than paying your infra team to straddle two environments?
I've setup on-prem environments that would automatically and transparently spill workload to the cloud when we ran out of on-prem capacity. The prerequisite is that your operations people need to have strong automation skills, there can't be any human button-pushers in the loop, but it really isn't that much work nor does it require unusual technical skill if you have a competent operations environment. Once everything is wired up, it mostly just works. Even the devs often don't know whether their workloads are running on-prem or in the cloud.
An under-rated aspect of this is that it provides detailed data for planning the expansion of the on-prem environment, including changes in how the hardware is being used.
My experience is that this is a good model. On-prem is about 1/3 the cost of the cloud, fully burdened, in my experience but being able to transparently spill to the cloud mitigates the capacity and supply chain risks that come with on-prem.
It is effectively a cheap insurance policy.
As someone with experience with a company that did hybrid, I’ll say: it only makes sense if your infra team deeply understands computers.
The end state is “just some IaC,” wherein it doesn’t really matter to anyone where the application lives, but all of the underlying difficulties in getting to that state necessitate that your team actually, no-shit knows how distributed systems work. They’re going to be doing a lot of networking configuration, for one, and that’s a whole speciality.