Hacker News

If you already know a word, just mark it as "already known". If you already know all the words it's showing you, that will cause the difficulty to ramp up very quickly as it starts skipping ahead to find words you might not know. (If you scroll down to the bottom of the page and open the "graphs" section, you can see the logic behind it.)

The app "only" includes about the 3,000 most common words, so if you're past that level, I don't know how helpful it will be to you. I can easily extend this in the future, I just need bigger corpus with more data.

yorwba 2 days ago [ - ]

The ramp up feels rather slow. For 3000 words, a binary search should take less than 12 steps to find the right level. Maybe it's because you add 5 new words each time, turning those 12 steps into 60 words instead?

Also, I'm confused that you say you would need a bigger corpus for more words, since your readme says that you use the OpenSubtitles data from OPUS. Their 2024 release has tens of millions of sentences for each language, which surely should be enough for tens of thousands of unique words?

ChadNauseam a day ago [ - ]

Thanks for the feedback. I’ve tuned the initial ramp up to be more aggressive and I’ve made it so it adds words in smaller increments at first. Now, after adding 16 words and marking them all as known, it skips to the ~500th most common word. After adding 25, it skips to the ~100th.

What you say about binary search as a good point. I initially used something more like a straightforward binary search, but the issue is that the ramp up is too quick and beginner users would end up adding a bunch of words that were way too advanced for that level. So I tried to make it less aggressive to avoid overshooting, but I guess that has the opposite issue of it taking longer for advanced users. I’ll think about what I can do about that.

For the corpus, I prefer to use Neri’s sentence lists as they’re much higher quality than opensubtitles. You’d be surprised at the problems it has. So I only use opensubtitles for korean (because Neri’s sentence lists doesn’t have a korean version).

ChadNauseam a day ago [ - ]

Correction: After adding 25, it skips to the ~1000th