Just because it's not mechanically actually forgetting everything doesn't mean the phrase isn't having a non trivial effect (that isn't 'pretend'). Mechanically, based on all current context, Transformers choose how much attention/weight to give to each preceding token. Very likely, the phrase makes the model pay much less attention to those tokens, alleviating the issues of context rot in most (or a non negligible amount of) scenarios.