The real serverless horror isn't the occasional mistake that leads to a single huge bill, it's the monthly creep. It's so easy to spin up a resource and leave it running. It's just a few bucks, right?

I worked for a small venture-funded "cloud-first" company and our AWS bill was a sawtooth waveform. Every month the bill would creep up by a thousand bucks or so, until it hit $20k at which point the COO would notice and then it would be all hands on deck until we got the bill under $10k or so. Rinse and repeat but over a few years I'm sure we wasted more money than many of the examples on serverlesshorrors.com, just a few $k at a time instead of one lump.

this is really the AWS business model - you can call it the "planet fitness" model if you prefer. Really easy to sign up and spend money, hard to conveniently stop paying the money.

Sounds like your organization isn’t learning from these periods of high bill. What lead to the bill creeping up, and what mechanisms could be put in place to prevent them in the first place?

At only 20k a month, the work put into reducing the bill back down probably costs more in man hours than the saving, time which would presumably be better spent building profitable features that more than make up for the incremental cloud cost. Assuming of course the low hanging fruit of things like oversized instances, unconstrained cloudwatch logs and unterminated volumes have all been taken care of.

> what mechanisms could be put in place to prevent them in the first place?

Those mechanisms would lead to a large reduction in their "engineering" staff and the loss of potential future bragging rights in how modern and "cloud-native" their infrastructure is, so nobody wants to implement them.

You don't think this happens on prem? Servers running an application that is no longer used?

Sure they're probably VMs but their cost isn't 0 either

With that model, your cost doesn't change, though. When/if you find you need more resources, you can (if you haven't been doing so) audit existing applications to clear out cruft before you purchase more hardware.

The cost of going through that list often outweighs the cost of the hardware, by a lot.

And in a lot of cases it's hard to find out if a production application can be switched off. Since the cost is typically small for an unused application, I don't know if there are many people willing to risk being wrong

People always say stuff like this, and I just don’t buy it. It’s not that hard to analyze network traffic to see what does and doesn’t have active connections. When you’re relatively certain, shut it off for a week. If no one screams, delete it. If a month later someone is screaming, it’s their own damn fault for having no docs on something idle 90% of the time.

I've done many things that got new data ingested on a monthly basis. So say 29 days out of every month they would be idle.

Is it worth starting and stopping those kind of things? Probably not?

If you turn off a VM running something like that, because you didn't see any traffic for a day. Are you going to explain how you just shut it down to save a few dollars a month? I would very much like to see how that unfolds

That's the equivalent of saying "just audit your cloud usage and remove stuff that's no longer used".