The general idea is for the signature to be random each time, but verifiable. There are a bajillion approaches to this, but a simple starting point is to generate a random nonce, encrypt it with your private key, then publish it along with the public key. Only you know the private key, so only you could have produced the resulting random string that decodes into the matching nonce with the public key. Also, critically, every signature is different. (that's what the nonce is for.) If two videos appear to have the same signature, even if that signature is valid, one of them must be a replay and is therefore almost certainly fake.
(Practical systems often include a generational index or a timestamp, which further helps to detect replay attacks.)
I think for the approach discussed in the paper, bandwidth is the key limiting factor, especially as video compression mangles the result, and ordinary news reporters edit the footage for pacing reasons. You want short clips to still be verifiable, so you can ask questions like "where is the rest of this footage" or "why is this played out of order" rather than just going, "there isn't enough signature left, I must assume this is entirely fake."
But the point is that you'd be extracting the nonce from someone else's existing video of the same event.
If a celebrity says something and person A films a true video, and person B films a video and then manipulates it, you'd be able to see that B's light code is different. But if B simply takes A's lighting data and applies it to their own video, now you can't tell which is real.
I am not defending the proposed method, but your criticism is not why:
Lets assume the pixels have an 8-bit luminance depth, and lets say the 7 most significant bits are kept, and the signature is coded in the last bit of the pixels in a frame. A hash of the full 7-bit image frame could be cryptographically signed, while you could copy the 8-th bit plane to a fake video, the same signature will not check out according to a verifying media player, since the fake video's leading 7-bit planes won't hash to the same hash that has been signed.
What does this change compared to status quo? nothing: you can already hash and sign a full 8-bit video, and Serious-Oath that it depicts Real imagery. Your signature would also not be transplantable to someone elses video, so others can't put fake video in your mouth.
The only difference: if the signature is generated by the image sensor, and end-users are unable to extract the private key, then it decreases the number of people / entities able to credibly fake a video, but provides great power to the manufacturers to sign fake videos while the masses are unable to (unless they play a fake video on a high quality screen being imaged by a manufacturer-privatekey-containing-image-sensor.
The bandwidth of the encoding is too low for playing cryptographic games. This doesn't preclude faking a video by introducing the code into your faked video--it's just that that is much, much more difficult than stringing pieces together in an incorrect fashion.
This is more akin to spread spectrum approaches--you can perfectly well know the signal is there and yet finding it without knowing the key is difficult. That's why old GPS receivers took a long time to lock on--all the satellites are transmitting on top of each other, just with different keys and the signal is way below the noise floor. You apply the key for each satellite and see if you can decode something. These days it's much faster because it's done in parallel.
what would that change
The general idea is for the signature to be random each time, but verifiable. There are a bajillion approaches to this, but a simple starting point is to generate a random nonce, encrypt it with your private key, then publish it along with the public key. Only you know the private key, so only you could have produced the resulting random string that decodes into the matching nonce with the public key. Also, critically, every signature is different. (that's what the nonce is for.) If two videos appear to have the same signature, even if that signature is valid, one of them must be a replay and is therefore almost certainly fake.
(Practical systems often include a generational index or a timestamp, which further helps to detect replay attacks.)
I think for the approach discussed in the paper, bandwidth is the key limiting factor, especially as video compression mangles the result, and ordinary news reporters edit the footage for pacing reasons. You want short clips to still be verifiable, so you can ask questions like "where is the rest of this footage" or "why is this played out of order" rather than just going, "there isn't enough signature left, I must assume this is entirely fake."
But the point is that you'd be extracting the nonce from someone else's existing video of the same event.
If a celebrity says something and person A films a true video, and person B films a video and then manipulates it, you'd be able to see that B's light code is different. But if B simply takes A's lighting data and applies it to their own video, now you can't tell which is real.
I am not defending the proposed method, but your criticism is not why:
Lets assume the pixels have an 8-bit luminance depth, and lets say the 7 most significant bits are kept, and the signature is coded in the last bit of the pixels in a frame. A hash of the full 7-bit image frame could be cryptographically signed, while you could copy the 8-th bit plane to a fake video, the same signature will not check out according to a verifying media player, since the fake video's leading 7-bit planes won't hash to the same hash that has been signed.
What does this change compared to status quo? nothing: you can already hash and sign a full 8-bit video, and Serious-Oath that it depicts Real imagery. Your signature would also not be transplantable to someone elses video, so others can't put fake video in your mouth.
The only difference: if the signature is generated by the image sensor, and end-users are unable to extract the private key, then it decreases the number of people / entities able to credibly fake a video, but provides great power to the manufacturers to sign fake videos while the masses are unable to (unless they play a fake video on a high quality screen being imaged by a manufacturer-privatekey-containing-image-sensor.
The bandwidth of the encoding is too low for playing cryptographic games. This doesn't preclude faking a video by introducing the code into your faked video--it's just that that is much, much more difficult than stringing pieces together in an incorrect fashion.
This is more akin to spread spectrum approaches--you can perfectly well know the signal is there and yet finding it without knowing the key is difficult. That's why old GPS receivers took a long time to lock on--all the satellites are transmitting on top of each other, just with different keys and the signal is way below the noise floor. You apply the key for each satellite and see if you can decode something. These days it's much faster because it's done in parallel.