I've run into this problem as well. Best results I've gotten is to over-explain what the stop criteria are. eg end with a phrase like
> You are done when all steps in ./plan.md are executed and marked as complete or a unforeseen situation requires a user decision.
Also as a side note, asking 5.4 explain why it did something, returns a very low quality response afaict. I would advice against trusting any model's response, but for Opus I at least get a sense it got trained heavily on chats so it knows what it means to 'be a model' and extrapolate on past behavior.