As another user suggested, train on the corpus that ends with the white paper publication.

That’s not feasible. Apparently only SOTA models present this behavior. Having cutoff date at paper publication significantly hinders its capabilities. Besides that, try to convince anyone to spend millions upon millions of dollars to train a model with primary goal of possibly being able to deanonymize one person.

But then compare it to the corpus of any of the suspects since the whitepaper publication.

It's one thing to sound like Satoshi before the whitepaper, but does anyone still sound like Satoshi?