Think in the reverse direction. Since you can have exact provenance data placed into the token stream, formatted in any particular way, that implies the models should be possible to tune to be more "mindful" of it, mitigating this issue. That's what makes this different.