was wondering myself, just tried comparing to petnames crate -- gets you about 2 tokens per word on average

not that anyone should ever care; typos in random-looking ids are very real but already covered by human readable ids

besides, this is for a specific tokenizer