My job as a DevOps engineer is to ensure customer uptime. If rebooting is the fastest, we do that. Figuring out the why is the primary developers’ jobs.

This is also a good reason to log everything all the time in a human readable way. You can get services up and then triage at your own pace after.

My job may be different than other’s as I work at an ITSP and we serve business phone lines. When business phones do not work it is immediately clear to our customers. We have to get them back up not just for their business but for the ability for them to dial 911.