It appears to be the top n-grams scored by the product of frequency and length. Including the frequency weighting is a bit nonstandard among ablative methods.
See line 233: https://github.com/google/sentencepiece/blob/master/src/unig...
I would suspect the n-gram counts don't cross pre-token boundaries, but I don't have time to find that in the code right now.
You can cross whitespace boundaries by setting flag `--split-on-whitespace` to false (it's true by default).
https://github.com/google/sentencepiece/blob/master/doc/opti...
Anyone reading this in the future, I meant to say the length weighting is a bit nonstandard. It is usually by frequency. Oops