Neat way to validate.

Your method of sampling could be improved further, unfortunately at the expense of ease of use. If the dictionary was sorted according to difficulty, then you could use stratified sampling.

I comment on the related aspects here.

https://news.ycombinator.com/item?id=48599769