Just wanted to say, thanks for doing this! Now the old rant...

I started my career when on-prem was the norm and remember so much trouble. When you have long-lived hardware, eventually, no matter how hard you try, you just start to treat it as a pet and state naturally accumulates. Then, as the hardware starts to be not good enough, you need to upgrade. There's an internal team that presents the "commodity" interface, so you have to pick out your new hardware from their list and get the cost approved (it's a lot harder to just spend a little more and get a little more). Then your projects are delayed by them racking the new hardware and you properly "un-petting" your pets so they can respawn on the new devices, etc.

Anyways, when cloud came along, I was like, yeah we're switching and never going back. Buuut, come to find out that's part of the master plan: it's a no-brainer good deal until you and everyone in your org/company/industry forgets HTF to rack their own hardware, and then it starts to go from no-brainer to brainer. And basically unless you start to pull back and rebuild that muscle, it will go from brainer to no-brainer bad deal. So thanks for building this muscle!

we're in a pretty unique situation in that very early on we fundamentally can't afford the hyperscaler clouds to cover operations, so we're forced to develop some expertise. turned out to be reasonably chill and we'll prob stick with it for the foreseeable future, but we have seen a little bit of the state-creep you mention so tbd.

Yeah from memory on-prem was always cheaper, it just removed a lot of logistic obstacles and made everything convenient under one bill.

IIRC the wisdom of the time cloud started becoming popular was to always be on-prem and use cloud to scale up when demand spiked. But over time temporarily scaling up became permanent, and devs became reliant on instantly spawning new machines for things other than spikes in demand and now everyone defaults to cloud and treats it as the baseline. In the process we lost the grounding needed to assess the real cost of things and predictably the cost difference between cloud and on-prem has only widened.

> IIRC the wisdom of the time cloud started becoming popular was to always be on-prem and use cloud to scale up when demand spiked.

I've heard that before but was never able to make sense of it. Overflowing into the cloud seems like a nightmare to manage, wouldn't overbuilding on-prem be cheaper than paying your infra team to straddle two environments?

I've setup on-prem environments that would automatically and transparently spill workload to the cloud when we ran out of on-prem capacity. The prerequisite is that your operations people need to have strong automation skills, there can't be any human button-pushers in the loop, but it really isn't that much work nor does it require unusual technical skill if you have a competent operations environment. Once everything is wired up, it mostly just works. Even the devs often don't know whether their workloads are running on-prem or in the cloud.

An under-rated aspect of this is that it provides detailed data for planning the expansion of the on-prem environment, including changes in how the hardware is being used.

My experience is that this is a good model. On-prem is about 1/3 the cost of the cloud, fully burdened, in my experience but being able to transparently spill to the cloud mitigates the capacity and supply chain risks that come with on-prem.

It is effectively a cheap insurance policy.

As someone with experience with a company that did hybrid, I’ll say: it only makes sense if your infra team deeply understands computers.

The end state is “just some IaC,” wherein it doesn’t really matter to anyone where the application lives, but all of the underlying difficulties in getting to that state necessitate that your team actually, no-shit knows how distributed systems work. They’re going to be doing a lot of networking configuration, for one, and that’s a whole speciality.

Docker is amazing for forcing the machines not to be pets, seriously, a racked sever is just another K3 or K8 node (or whatever) and doesn't get the choice or ability of being petted. It's so nice. You could maybe of said the same about vm's but not really, the VM just became the pet, OK you could at least image/snapshot it but it's not the same.

It’s interesting everyone having different experiences and those experiences drive what they do.

I would never dream of running Docker in production. It seems so overly complicated. Also, since day one, I could never understand using a public registry for mission critical stuff. When I was learning Docker, I would unplug the network cable so I wouldn’t accidentally push my container online somewhere with all my data.

I totally get the concept at scale. I also get the concept of just shipping an application in a container. I also get the concept of self-hosting of just give me the container so I don’t have to think about how it all works.

However, the complexity of building the container, cleanup, deleting entries, environment variables, no SSH availability, even on Railway in the beginning, ambiguous where your container needs to be to be to even get it somewhere. Public registry or private registry.

Certainly most of it is my lack of knowledge of not sticking with it.

Just give me a VM and some firewall rules. Cloning VMs can be automated in so many different ways.

/rant

I agree with you, but I also don’t do webdev. Docker seems neat, I’ve dabbled just to understand the stack and how it works. Not much of a container person I realized. Closest I get is spinning out yocto images and flashing them to hardware. Same image every time with updated versions of software. So maybe kind of like docker in the abstract sense, I suppose.

I’m mid-way through my career at this point and I’m hoping to skip the web/cloud bit, I just don’t enjoy it as much as I do hw/sw integrations.

I've found docker is as of a monstrous pet.

Docker is a monster that you have to treat as a pet. You've still got to pet it through stages of updating, monitoring, snapshots and networking. When the internal system breaks it's no different to a server collapsing.

Snapshots are a haircut for the monster, useful but can make things worse.

Not in my experience, super easy to setup a K3s cluster in a single rack. Certainly less hassle than VMWare was or XEN ever was.

I find the same with Systemd and Docker.

Some can tame the beast, for me it's always a fight until one of us holds down the power button.

[deleted]
[deleted]

I'm not op, but thanks for this. Like I mentioned in another comment, the wholesale move to the cloud has caused so many skills to become atrophied. And it's good that someone is starting to exercise that skill again, like you said. The hyperscalers are mostly to blame for this, the marketing FUD being that you can't possibly do it yourself, there are too many things to keep track of, let us do it (while conveniently leaving out how eye-wateringly expensive they are in comparison).

The other thing the cloud does not let you do is make trade offs.

Sometimes you can afford not to have triple redundant 1000GB network or a simple single machine with raid may have acceptable down time.

yeah this

it means that even after negotiating much better terms than baseline we run into the fact that cloud providers just have a higher cost basis for the more premium/general product.

Wanna see us do it again?