Hacker News

>the things you told it before are still taken into the matrix multiplication mincers and influence outputs just the same.

Not the same no. Models chooses how much attention to give each token based on all current context. Probably that phrase, or something like it, makes the model give much less attention to those tokens than it would without it.