Hacker News

curuinor a day ago [ - ]

can't assume gaussian underlying distribution of the word-knowing, it's known zipfian. so you can't be doing anovas or anything of that nature because if you look up zipfian distribution's variance, you get Nature and Reality giving you the middle finger

dgacmu 20 hours ago [ - ]

I think you mean it's lognormal, at least if we're discussing native English speakers or comparing those with similar amounts of exposure to the language.

(The median English speaker almost certainly knows several thousand words, or word stems to avoid duplication. But the number who know all words in the tail is exceptionally small.)

soVeryTired a day ago [ - ]

No way is vocab size zipfian. Word counts from a corpus follow zipf's law, but not vocab sizes themselves.

Otherwise the most common vocab size would be equal to one.

montag a day ago [ - ]

Not to mention, N=1