was wondering myself, just tried comparing to petnames crate -- gets you about 2 tokens per word on average
not that anyone should ever care; typos in random-looking ids are very real but already covered by human readable ids
besides, this is for a specific tokenizer