Ultimately you always have to trust people to be judicious, but that's why it doesn't make any changes itself. Only suggests mitigations (and my team knows what actions are safe, has context for recent changes, etc). It's not entirely a black box though. e.g. I've prompted it to collect and provide a concrete evidence chain (relevant commands+output, code paths) along with competing hypotheses as it works. Same as humans should be doing as they debug (e.g. don't just say "it's this"; paste your evidence as you go and be precise about what you know vs what you believe).
That's sounds like the perfect recipe for turning a small problem into a much larger one. 'on call' is where you want your quality people, not your silicon slop generator.
Ultimately you always have to trust people to be judicious, but that's why it doesn't make any changes itself. Only suggests mitigations (and my team knows what actions are safe, has context for recent changes, etc). It's not entirely a black box though. e.g. I've prompted it to collect and provide a concrete evidence chain (relevant commands+output, code paths) along with competing hypotheses as it works. Same as humans should be doing as they debug (e.g. don't just say "it's this"; paste your evidence as you go and be precise about what you know vs what you believe).
That's sounds like the perfect recipe for turning a small problem into a much larger one. 'on call' is where you want your quality people, not your silicon slop generator.