Chinese omits articles, verbs aren't conjugated, and individual characters carry more meaning than English letters, but other than those differences I don't have the impression that Chinese communication is inherently more concise. Some forms of official speech are wordy. Writing is denser, but the amount of information conveyed through speech is about the same. There are jokes about ambiguous words or phrases in both Chinese and English. So I was surprised at your take, but no objection to your points above. Ancient Chinese, on the other hand, is extremely concise, but so are other ancient languages like Hebrew, although in a different way. So it seems that ancient languages are compressed but challenging and modern languages have unpacked the compression for ease of understanding.

I'm going to guess Chinese and English is going to come out about the same, when someone invents the right metric to compare them. I recall reading about a study somewhere that compared speech in multiple languages wrt. amount of information communicated per second, and the reported result was they were all the same, because speakers of more verbose languages (longer words, simpler grammar) unknowingly compensate speaking faster than baseline.

That's a really interesting point about Ancient Chinese and other ancient scripts. I'd love to learn more about that.

I'm also more curious about tokenizers for LLMs than I've ever been before, both for Chinese and English. I feel like to understand I'll need to look at some concrete examples, since sometimes tokenization can be per word or per character or sometimes chunks that are in between.