Regarding $O(n L^2)$ vs $O(n L)$, that was because we somewhat sloppily tend to use the term 'tokenization' for both training a tokenizer vocab, and for tokenizing a given document. In the paper, we tried to always call the latter one segmentation or inference. The former is $O(n L^2)$ per iteration, while the latter $O(n L)$. I'll update the README to be more explicit about this.

No, the segmentation algorithm you have implemented has runtime O(N*L^2). In particular, if you want to do a hash table lookup using a string key, that takes time proportional to the length of the string, not constant time.

That's in interesting point. While your correct, of course, it is so common to consider a hash table lookup a O(1) operation, it never occurred to me. But in this case, the loops are actually really tight and the hash table lookup might be a significant part of the time, so it might well behave more like O(n L^2). I'll update the docs and paper.