I'm certainly not at liberty to invite a random to cover on-call shifts for us, but here's some anecdata about things I've witness that made on-call suck.

We began with free food delivery over the weekend, and the expectation that you'd take a day off the next week ("unlimited" PTO policy). Eventually they stopped letting us do that and now the "unlimited" in our PTO policy has an invisible limit, so you can't actually do that without it counting towards the invisible limit on your unlimited PTO for the year.

Our monitoring and alerting is unusably noisy. Deviance is fully normalized. All our postmortems typically have a section stating that alerts were issued, but ignored until customers began complaining. Attempts to cut the noise down to a sane level have all been defeated by the ever present pressure to feature factory. TBF this is mostly an engineering self-own and I feel partially responsible for this outcome.

The on-call engineer does a shocking amount of manual labor to paper over bugs in the product and un-stick users who fall through the (many) cracks. It is effectively a T3 tech support rotation. We've taken steps to tone it down to mere triage and channel this into pressure against offending teams' timelines, but there's a huge amount of silent cultural resistance and no one is being held accountable when a feature increases support load. I suspect this issue alone would make most bigtech engineers quit.

For the (many) issues that require manual intervention, the on-call engineer cannot actually do anything unless 2 other engineers sign off on a PR (either to run a SQL query or to deploy some tool or bugfix to resolve the problem).

This is more specific to the product I work on, but the sheer amount of 3rd party services we rely on means that something is constantly acting up and there's not a lot we can do about it. Our API client code for each service we use typically contains _at least_ one service-specific hacky workaround to keep things running in the face of bad behavior.

The frontend team has no on-call rotation despite causing plenty of bugs on their own. Backend engineers are expected to triage what are clearly frontend problems. We stood up a lot of observability tooling for the frontend but it took years for them to even start to use it.

More than anything, it feels like the moment I stop championing the issue, everyone stops paying attention and the on-call experience reverts to the mean. Other on-call engineers just sort of stop boyscouting and let the chaos wash over them while focusing on sprint obligations (can't blame them), and leadership takes their eye off the ball to chase growth (also can't blame them). Hugely fucked lack of accountability and the buck eventually stops at whoever is the poor guy holding the pager that week.

I wonder what would happen if you sent this nearly verbatim to executive leadership. It is quite a thorough, candid description of a serious problem.

At least I certainly wouldn't be happy to learn that my product was bursting at the seams and nobody was being held accountable. But I'm not an executive leader. (Maybe that's why?)

All of these issues were at some point raised to leadership. I've spent a lot of political capital on the issue and decided that it's not a hill I'm prepared to die on. Either a crop of new hires will come along and improve the situation with their fresh-eyed optimism, or it'll just keep happening and I'll try to remain zen.

And there's certainly a calculus to it that changes when you're an executive. To me, craftsmanship, diligence, and engineering excellence are important, not just because I love programming but also because I'm an IC and it affects me directly. To an executive, I am just some weird nerd they have to pay a lot of money to make computers do things. Beautiful code and a serene on-call experience are nice but they don't usually get a company acquired.

It probably doesn't worth it, considering it might impact the replier's career negatively. I'd never do that. I'd speak to my manager and if he just gets by then I just get by.