Hacker News

Just yesterday I was discussing many of the ideas presented here with a coworker. I had just walked out of a workshop led by $BIGTECHCOMPANY where someone presented the following toy example:

A service goes down. He tells the agent to debug it and fix it. The agent pulls some logs from $CLOUDPROVIDER, inspects the logs, produces a fix and then automatically updates a shared document with the postmortem.

This got me thinking that it's very hard to internalize both issue and solution -updating your model of the system involved- because there is not enough friction for you to spend time dealing with the problem (coming up with hypotheses, modifying the code, writing the doc). I thought about my very human limitation of having to write things down in paper so that I can better recall them.

Then I recalled something I read years ago: "Cars have brakes so they can go fast."

Even assuming it is now feasible to produce thousands of lines of quality code, there is a limitation on how much a human can absorb and internalize about the changes introduced to a system. This is why we will need brakes -- so we can go faster.

The gap in your example is that a human had to realize the system is broken so that he could nudge the agent into fixing it. He can fix that gap by updating the agent to recognize when the system breaks. This now becomes the level at which he debugs… did the agent recognize the failure and self-heal, or not?

And at that point, if the autonomous system breaks, realized it’s broken, and fixes itself before you even notice… then do you need to care whether you learn from it? I suppose this could obfuscate some shared root cause that gets worse and worse, but if your system is robust and fault-tolerant _and_ self-heals, then what is there to complain about? Probably plenty, but now you can complain about one higher level of abstraction.