I agree with you completely up until this line:
> The agent cannot learn from its mistakes.
If feedback from this incident is in its context window, it is highly unlikely to make this same mistake again. Yes this is only probabilistic, but so is a human learning from mistakes. They key difference is that for a human this is unlikely to be removed from their memory in a relevant situation, while for an agent it must be strategically put there.
> If feedback from this incident is in its context window, it is highly unlikely to make this same mistake again
If this incident gets into its training data, then its highly likely that it will repeat it again with the same confession since this is a text predictor not a thinker.
Or not, because telling the agent is misbehaving may predispose it to misbehaving behavior, even though you point told it so to tell it to not behave that way.
I remember this discussed when a similar issue went viral with someone building a product using replit's AI and it deleted his prod database.
> Yes this is only probabilistic, but so is a human learning from mistakes.
Yet, since I'm also a Human being, and can work to understand the mistake myself, the probability that I can expect a correction of the behavior is much higher. I have found that it significantly helps if there's an actual reasonable paycheck on the line.
As opposed to the language model which demands that I drop more quarters into it's slots and then hope for the best. An arcade model of work if there ever was one. Who wants that?
> If feedback from this incident is in its context window, it is highly unlikely to make this same mistake again.
In my experience, this isn't true. At least with a version or so ago of ChatGPT, I could make it trip on custom word play games, and when called out, it would acknowledge the failure, explain how it failed to follow the rule of the game, then proceed to make the same mistake a couple of sentences later.