Yes but isn't the cat out of the box already? Don't we have sufficiently strong local models that can be finetuned in various ways to rewrite text/alternate images and thus destroy possible watermarks.

Sure in some cases a model might do some astounding things that always shine through, but I guess the jury still out on these questions.