Hacker News

Thanks for the feedback. I’ve tuned the initial ramp up to be more aggressive and I’ve made it so it adds words in smaller increments at first. Now, after adding 16 words and marking them all as known, it skips to the ~500th most common word. After adding 25, it skips to the ~100th.

What you say about binary search as a good point. I initially used something more like a straightforward binary search, but the issue is that the ramp up is too quick and beginner users would end up adding a bunch of words that were way too advanced for that level. So I tried to make it less aggressive to avoid overshooting, but I guess that has the opposite issue of it taking longer for advanced users. I’ll think about what I can do about that.

For the corpus, I prefer to use Neri’s sentence lists as they’re much higher quality than opensubtitles. You’d be surprised at the problems it has. So I only use opensubtitles for korean (because Neri’s sentence lists doesn’t have a korean version).