I can’t quantify how much time my team wasted in diagnosing production glitches on checking the wrong time offsets but it was substantial. One of our systems wasn’t using UTC, and given enough time the fact that Slack wasn’t using it either does become an issue. When an outage transitions to All Hands on Deck everyone needs to get caught up to what’s going on preferably under their own power so you don’t suffer the Adding Resources to a Late Project problem.

So that first alert that came in ?? minutes ago you need to align with the telemetry and logs in order to see what the servers were doing right before everything went to shit.