I am not sure that text watermarking will be accurate, I foresee plenty of false positives.

In practice, very short texts don't carry very high value so watermarking is (usually) less important. For longer text false positives are not an issue at all since you have a large amount of data to extract your signal from.