I've been thinking some more about this, and it occurred to me that you'd want to encode sentence boundaries as a pseudo-word in the n-tuples.

I then realised that "[period] <word>" would likely dominate most common pairs, and that a lot of time could be saved by simply recording the first word of sentences as their own vector set, in addition but separate to the regular word vector.

Whether this would be a stronger or weaker signal per-vector-space than the tail of words in the regular common-words vector I don't know.