What I struggle with is developers wanting to leave platforms like Datadog for open source equivalents that need to be self-hosted.

I hear all of the cost savings benefit, but I never see the team factoring in their own time (and others time) needed to set up and maintain these systems reliably long term.

Something IC’s at company often struggle to understand is the reason why companies often prefer to buy managed solutions even when “free” alternatives exist (read: the free alternatives are also expensive, just a different type of cost)

My log bill for Google cloud log would be like 30k. For splunk I like 80k. I self host for 1.5k per month. Spend maybe an hour a month? Easiest money I ever made.

When you’re in the middle of a production down event and your whole team is diagnosing the issue, and your log server is unresponsive, who do you contact for support?

No one, you pull an engineer off the production issue to debug the log server, because you need the log server to debug the production servers.

See the problem?

Edit: to be clear I’m no fan of Datadog and I wish self hosting were an option. I want this path for our company, but at least on our team we just don’t have enough (redundant) expertise to deploy and manage these systems. We’d have to hire an extra FTE.

If you’re having a correlated outage like that, then it’s likely you fix the prod issue before the cloud engineers at some giant cloud company even respond to an internal escalation much less fixes an issue. More than likely your prod issue is causing the logging problem.

If you mean you are experiencing two totally unrelated issues at the same time, then I don’t think that’s a reasonable thing to really assign much value to as it’s incredibly unlikely.

Half of $30k/mo trivially pays for an engineer you hire to only manage such a cluster for you and just works an hour a week unless a pager goes off if you truly need that level of peace of mind. If you’re hiring for such a position I have a few rock star level folks who would love such a job.

The hypothetical problems people imagine for on-prem infrastructure get really strange to me. I could come up with the same sort of scenarios for cloud based SaaS infrastructure just as easily.

> I don’t think that’s a reasonable thing to really assign much value to as it’s incredibly unlikely.

In my experience the systems/tools needed to debug production issues are often only used when they’re needed.

Which now means you need health and uptime monitoring on your log server since without that, it might break randomly and no one notices until you need it.

> The hypothetical problems people imagine for on-prem infrastructure get really strange to me

It really comes down to the people and whether you have the expertise on the team. And whether the team can realistically manage the system long term. It’s typically safer to spend more money for the managed service.

(It’s a safer decision, not necessarily better)

> It really comes down to the people and whether you have the expertise on the team

Aren't these people suppose to debug and fix complex problems in prod? And if they can do that, why can't they run and debug a log server?

Of course there are trade offs with any outsourcing decision. But I think we should have higher expectations of engineers

I don’t think it’s necessarily safer or better for anything but your job security.

100% agree. If I am using a cloud log provider I wouldn't expect them to solve my logging issue(s) as fast as I need, more importantly I have no real way to put more resources on that fix.

More importantly, with a third party service I'd be very surprised if both went down at the same time and it wasn't a further upstream issue like AWS. If its my own logging service and it went down during a prod outage, I likely didn't properly isolate my logging service in the first place.

> Half of $30k/mo trivially pays for an engineer you hire to only manage such a cluster for you and just works an hour a week unless a pager goes off if you truly need that level of peace of mind. If you’re hiring for such a position I have a few rock star level folks who would love such a job.

1 person? Is that person always on call?

Yep, absolutely. I’ve come up with the term “man on the mountain” for such positions.

It’s when one person is exceedingly talented at exactly one thing - but isn’t exactly a typical employee who is good or interested in doing much else other than keeping that one thing online and reliable.

Their job is to go live on their mountain for weeks or months at a time without so much as doing anything other than keeping their phone on and answering it within the first couple rings regardless of when called. If they are good at their job you likely don’t even need to call - they already know it’s broken before you do.

I’ve employed a few such folks over my career. They tend to be the “alternative” style candidate - exceptional people with exceptional flaws. They love the simple tradeoff.

That said of course this is ignoring bus factor and overly simplifying things. Typically this is one deep subject level matter expert who sits off on the side of a small team, so there is at least one “understudy” hanging around as well.

I still advocate for such positions when they make sense though. I would much rather in-house my own “insurance” vs overpay some giant company for each month only to find out the insurance didn’t exist when I needed to make a claim. It’s certainly more risk to my career - but I have very strong feelings that as a manager or executive my job is NOT to cover my own ass because it’s easier.

The old argument for being locked in to legacy software costing 6-8 figures a year was that you had no choice. Now you have a choice! Clearly that is better, and everyone should evaluate that choice on its merits, and the stock market sees that people are voting with their dollars. If your whole sales pitch is "good luck when it breaks!" you might want to reevaluate your business model.

The stock market is trying to predict that people will vote with their dollars in the future. I’m not quite sure people are really replacing enterprise Saas at large corporations yet. It’s more of a projection.

Fair, however at some point of a companies size/spending the complexity of integrating with a SaaS becomes as large as the one to run your own open source tool.

Beyond that, and Im aware this is very much application/company dependent, theres plenty of SaaS companies that offer horrendous or no support no matter what you pay. We used to use splunk for monitoring and logging. Paid a ton of money because we were handling financial data and needed tracibility and reliability. We constantly had to put out fires that were caused by their unreliable platform. It was not a good experience.

Ultimately, we jumped ship to Prometheus. We pay a fraction of the price and spent less time on it.

[deleted]

You don’t, you just look at the log like us old timers and solve the problem. It’s literally no different than solving the problem on the cloud.

Boogeyman

Have you ever tried to contact their support?

The problem is all these SaaS companies have cut costs so much that all their support has been reduced to useless offshore at best and at worst a chatbot. They do go down and don't work and often times there's simply nothing you can do. The worst offenders will seize upon the moment and force you to upgrade a support plan before they will even talk to you, even if the issue is their own making.

Unless you're a huge customer and already paying them tons of money, expect to receive no support. Your only line of defense if something happens and you're not a whale is that some whale is upset and they actually have their people working on the problem. If you're a small company, startup, or even mid-size, good luck on getting them to care. You'll probably be sent a survey when you don't renew and may eventually be a quotient in their risk calculus at some point in the distant future, but only if you represent a meaningful mass of customers they lost.

> The problem is all these SaaS companies have cut costs so much that all their support has been reduced to useless offshore at best and at worst a chatbot.

Tremendous opportunity announcement!

If you are building a dev-focused SaaS, treat your support team exactly as they are: a key part of the product. Just like docs or developer experience, the support experience is critical.

Trouble is, it's hard to quantify the negative experience, though tracking word of mouth referrals or NPS scores can try.

Oh come on, nobody uses the cloud because of support! Let's be real now.

99% of the time a cloud migration is because of OpEx/CapEx accounting shenanigans.

This is the exception to the rule

Do they actually not understand that? They might just be fine with a system that makes them more useful.

How do you calculate the time spent on an internal tool like this, actually? (I’ve never been in management). Realistically your team inevitably will have some downtime, maybe some internal tool maintenance can be fit in there? I mean it obviously isn’t fully “free” but is also shouldn’t be “billed” at their full salary, right?

> How do you calculate the time spent on an internal tool like this, actually?

In broad strokes there's two ways. You can count it as an operational expense, or you can count it as capital (this takes more work to do but can have some advantages). If you count it as operations, it's just a big red pit you're throwing money into that you hope is offsetting a larger operational cost somewhere (but this can be hard to quantify). If you count it as capital, you're basically storing all of those hours as an "asset" which then loses value over time (it's kind of like the charge in a battery). The problem is you have to be able to show that this internal tool would, in the case of an acquisition or liquidation, be valued by the new owner at the value you're setting it at.

The problem there being that people are even more hesitant to trust somebody else's internal tool than they are to trust their own internal tool, so I've seen multiple managers think "I sunk a million dollars into this so it must be worth something" but in fact they were just running a jobs program for their team.

> Realistically your team inevitably will have some downtime

What? My team wouldn't have any downtime even if we had 10x the amount of people.

If you work at a company where you have times where you don't have work to do, you should polish your resume because it means the company will go under.

Doing work is easy, not doing work is hard. It's trivial for any engineer to find stuff to do. The trick is doing the right stuff. Most software is bad and clunky, most requirements are wrong, and most of your customers, at best, tolerate your product.

I think most software companies need to be doing less. Deleting code, refining, and making their product genuinely useful as opposed to "able to technically contort to client needs".

Agreed here as well. If you gave me 10 devs for 3 years and zero new incoming requirements the backlog wouldn't even go down by 20%.

Agreed, our backlog is insane.

Because most of them arent trained to think economically... how many people on the planet do you think are aware of the notion of opportunity cost?

[deleted]

>the free alternatives are also expensive, just a different type of cost

Not if you hire reasonably competent people. These days for vast majority of FOSS services all you need is an ability to spin up a VPS and run a number of simple Docker/Podman Compose commands, it can't be that hard.

Ok so they cost you reasonably competent people. Those are expensive!

Only if your company already is lacking in the domain of competence of your engineers. If that is the case, either you have bigger problems to worry about, or your product probably isn't impressive enough to begin with to warrant an addition of complex, enterprise-grade SaaS tooling.

Or they're busy working on the core product and not screwing around on something that can be bought easily.

If there's ever any use case to leave an expensive SaaS for self hosted, you can find it at datadog

I'm sorry but the amount of companies that need something like DataDog is quite small compared to their 30,000+ customer count. Maybe 5,000 companies on Earth truly need something like DataDog, 80% of their customers would be perfectly fine with a self hosted instance of grafana.

Using an open source self hosted solution should be the industry standard, encouraged position, by default. Our industry does not gain overall from using DataDog but only from truly open source solutions that utilized AGPL licenses that allows everyone to move forward together + share lessons together + contribute together toward a common goal of better observability.

Why are we acting like it's hard to set up? This isn't the 1990s, it's 2026. Tooling has gotten quite good over the last decade.

Also corporations stupidly spend money all the time, they over spend too. I recently left a company that was paying SalesForce $10mil a year in licenses when only 8 people in the entire 3,000 person company was using it. I doubt that was the only single instance across our industry too. There is a massive amount of waste and graft in enterprise sales.

I honestly doubt it if you replaced grafana for 10,000 DataDog customers they would notice the difference.

> Why are we acting like it's hard to set up?

Because the current generation of “full stack” engineers are great at spinning up react apps, but struggle with infrastructure and systems management. It’s really not any more complicated than that.

On a typical 8 person engineering team, maybe 1 or 2 people will know how to deploy anything to the cloud if you’re lucky.

The expertise just isn’t there at most companies.

Expertise isn't there because people are outsourcing that sort of work to companies. I didn't know how to do much of anything, until I had to do it for work. Then learning everything became way easier.

Surely all the engineers that existed 20 years ago haven’t simply retired? At the time if you told someone you couldn’t set up your own server they’d ask you what kind of engineer you are then?

> Surely all the engineers that existed 20 years ago haven’t simply retired?

20 years ago we had 5 times fewer engineers. And most of those have moved into management, other fields, retired, work calm jobs for the government or boring companies, etc.

How many 40+ year old engineers do you see, especially when compared to 20-30 year old engineers?

I guess we really are living through the leetcode generation! D:

My experience matches that of cj. In fact, if you do mention anything outside the walled garden, you'll get weird looks and someone will ask "Why?" like you are going down a dangerous path.

Come to think of it, they are right. Why take all this ownership when it's the company that is going to pay for all of this and you can push these responsibilities to some third-party overseas.

[deleted]