From the paper:

“ rather than encoding a specific message, this watermark encodes an image of the unmanipulated scene as it would appear lit only by the coded illumination”

They are including scene data, presumably cryptographically signed, in the watermark, which allows for a consistency check that is not easily faked.

That's just saying that the coded image will only be apparent in the areas of the image lit by the light. Which is obvious, that's how a flashlight works too. They're not signing the actual pixels or anything. They've increased the difficulty to that of 3D-mapping the scene and transferring the lighting: not trivial, but still two long-studied problem spaces.