Hm, that’s a multinomial classification with a very high cardinality. It’s really weird it works. I’m sure it does as the author states, but for how many authors (out of the whole web) does this work?

It worked on me, and I would be shocked if my blog (dmd.3e.org) has more than a dozen readers. I am stunned.

It's not about the readers, just the fact that there's enough of a sample that it can use, with sufficient differentiation from other content.

I’ve posted on average 3 things a year.

There are ~8 billion people. Sounds big, but it's only 2^33. Ie if you can find 33 things about the text which halve the number of possible writers, you have narrowed it down to 1 person.

Just a couple more things and you can accommodate some of your things being mistaken/wrong/uncertain too.

Sure the cardinality is high, but the model isn't using a uniform prior. What do you suppose all the the values in each of the terms are, P(Text sample | Kelsey Piper) * P(Text sample) / P(Kelsey Piper)?

Maybe it just says all writing is Kelsey Piper.