Another point is that DWFM is likely working in a privileged, isolated network because it needs access deep into the core control plane. After all, you don't want a rogue service to be able to add a malicious agent to a customer's VPC.

And since this network is privileged, observability tools, debugging support, and even maybe access to it are more complicated. Even just the set of engineers who have access is likely more limited, especially at 2AM.

Should AWS relax these controls to make recovery easier? But then it will also result in a less secure system. It's again a trade-off.