Maybe there isn't enough data for each user for pairs, but I thought about mixing the two approaches (but had no time to do it), that is, to have 350 components like now, for the single word frequency, plus other 350 for the most common pairs frequency. In this way part of the vector would remain a high enough signal even for users with comparable less data.
I've been thinking some more about this, and it occurred to me that you'd want to encode sentence boundaries as a pseudo-word in the n-tuples.
I then realised that "[period] <word>" would likely dominate most common pairs, and that a lot of time could be saved by simply recording the first word of sentences as their own vector set, in addition but separate to the regular word vector.
Whether this would be a stronger or weaker signal per-vector-space than the tail of words in the regular common-words vector I don't know.