Hacker News

Another possibility is output watermarking. It's possible to watermark LLM generated text by subtly biasing the probability distribution away from the actual target distribution. Given enough text you can detect the watermark quite quickly, which is useful for excluding your own output from pre-training (unless you want it... plenty of deliberate synthetic data in SFT datasets now as this post-mortem makes clear).

I was told this was possible many years ago by a researcher at Google and have never really seen much discussion of it since. My guess is the labs do it but keep quiet about it to avoid people trying to erase the watermark.