AWS may be overcharging but it's a balancing act. Going on-prem (well, shared DC) will be cheaper but comes with requirements for either jack of all trades sysadmins or a bunch of specialists. It can work well if your product is simple and scalable. A lot of places quietly achieve this.

That said, I've seen real world scenarios where complexity is up the wazoo and an opex cost focus means you're hiring under skilled staff to manage offerings built on components with low sticker prices. Throw in a bit of the old NIH mindset (DIY all the things!) and it's large blast radii with expensive service credits being dished out to customers regularly. On a human factors front your team will be seeing countless middle of the night conference calls.

While I'm not 100% happy with the AWS/Azure/GCP world, the reality is that on-prem skillsets are becoming rarer and more specialist. Hiring good people can be either really expensive or a bit of a unicorn hunt.

It's a chicken and egg problem. If the cloud didn't become such a proeminent thing, the last decade and a half would have seen the rise of much better tools to manage on-premise servers (= requiring less in-depth sysadmin expertise). I think we're starting to see such tools appear in the last few years after enough people got burned by cloud bills and lockin.

And don't forget the real crux of the problem: Do I even know whether a specialist is good or not? Hiring experts is really difficult if you don't have the skill in the topic, and if you do, you either not need an expert, or you will be biased towards those that agree with you.

It's not even limited to sysadmins, or in tech. How do you know whether a mechanic is very good, or iffy? Is a financial advisor giving you good advice, or basically robbing you? It's not as if many companies are going to hire 4 business units worth of on prem admins, and then decide which one does better after running for 3 years, or something empirical like that. You might be the poor sob that hires the very expensive, yet incompetent and out of date specialist, whose only remaining good skill is selling confidence to employers.

> Do I even know whether a specialist is good or not?

Of course but unless I misunderstood what you meant to say, you don't escape that by buying from AWS. It's just that instead of "sysadmin specialists" you need "AWS specialists".

If you want to outsource the job then you need to go up at least 1 more layer of abstraction (and likely an order of magnitude in price) and buy fully managed services.

This only gets worse as you go higher in management. How does a technical founder know what good sales or marketing looks like? They are often swayed by people who can talk a good talk and deliver nothing.

The good news with marketing and sales is that you want the people who talk a good talk, so you're halfway there, you just gotta direct them towards the market and away from bilking you.

I'm proudly 100% on prem Linux sys admin. There are not openings for my skills and they do not pay as well as whatever cloud hotness is "needed".

Nobody is hiring generalists nowadays.

At the same time, the incredible complexity of the software infrastructure is making specialists more and more useless. To the point that almost every successful specialist out there is just some disguised generalist that decided to focus their presentation in a single area.

Maybe everyone is retaining generalists. I keep being given retention bonuses every year, without asking for a single one so far.

As mentioned below, never labeled "full stack", never plan on it. "Generalist" is what my actual title became back in the mid 2000s. My career has been all over the place... the key is being stubborn when confronted with challenges and being able to scale up (mentally and sometimes physically) to meet the needs, when needed. And chill out when it's not.

> Nobody is hiring generalists nowadays.

What?

I throw up in my mouth every time I see "full stack" in a job listing.

We got rid of roles... DBA's, QA teams, Sysadmins, then front and back end. Full Stack is the "webmaster" of the modern era. It might mean front and back end, it might mean sysadmin and DBA as well.

Even full stack listings come with a list of technologies that the candidate must have deep knowledge of.

> We got rid of roles... DBA's, QA teams, Sysadmins, then front and back end.

On a first approximation, those roles were all wrong. If your people don't wear many of those hats at the same time, they won't be able to create software.

But yeah, we did get rid of roles. And still require people to be specialized to the point it's close to impossible to match the requirements of a random job.

That's the crazy thing.

Most AWS-only Ops engineers I know are making bank and in high demand, and Ops teams are always HUGE in terms of headcount outside of startups.

The "AWS is cheaper" thing is the biggest grift in our industry.

I think this is driven by the market itself and the way cloud promotes their product.

After fully in cloud for sometimes, we’re moving to hybrid solutions. The upper management happy with costs and the cloud engineer had new toy's

1. large, homogenous domain where the budget for your department is large

2. niche, bespoke domain primarily occupied by companies looking to cut costs

I wonder how vibe coding will impact this.

You can easily get your service up by asking claude code or whatever to just do it

It produces aws yaml that’s better than many devops people I’ve worked with. In other words, it absolutely should not be trusted with trivial tasks, but you could easily blow $100K’s per year for worse.

I've been contemplating this a lot lately, as I just did code review on a system that was moving all the AWS infrastructure into CDK, and it was very clear the person doing it was using an LLM which created a really complicated, over engineered solution to everything. I basically rewrote the entire thing (still pairing with Claude), and it's now much simpler and easier to follow.

So I think for developers that have deep experience with systems LLMs are great -- I did a huge migration in a few weeks that probably would have taken many months or even half a year before. But I worry that people that don't really know what's going on will end up with a horrible mess of infra code.

To me it's clear that most Ops engineers are vibe coding their scripts/yamls today.

The time difference between having a script ready has decreased dramatically in the last 3 years. The amount of problems when deploying the first time has also increased in the same period.

The difference between the ones who actually know what they're doing and the ones who don't is whether they will refactor and test.

> AWS may be overcharging but it's a balancing act. Going on-prem (well, shared DC) will be cheaper but comes with requirements for either jack of all trades sysadmins or a bunch of specialists

Much easier to find. Even more, they are skills much easier to learn for existing engineers. What's better, they are fundamental skills that will never lose their value as those systems are what everything else is built on.

Managed servers reduce the on-prem skillset requirement and can also deliver a lot of value.

The most frustrating part of hyperscalers is that it's so easy to make mistakes. Active tracking of you bill is a must, but the data is 24-48h late in some cases. So a single engineer can cause 5-figure regrettable spend very quickly.

What size companies are we talking about

I've seen this in startups with 7 figure ARR (where annual cloud costs were also 7 figures).

Also seen that in F500 where a single architect caused a 5-figure mistake which remove cloud privileges from the entire architecture team. Can't make it up.

I mean 5 figures is nothing to most companies when your spending 10s of millions on aws.

7 figures is single digit millions. 5 figures is a single silly mistake. Just enabling more verbosity on logs is enough to trigger that in a day. Such mistakes would be found monthly. We could have easily had 10% more engineers with the same budget if it weren't for lighting our runway on fire that way.

[deleted]

It depends upon how many resources your software needs. At 20 servers we spend almost zero time managing our servers, and with modern hardware 20 servers can get you a lot.

Its easier than ever to do this but people are doing it less and less.