Hacker News

I don't know who all here is interested in language learning, but I thought I'd share something I've been working on. I was frustrated by the inefficiency of Duolingo, and while the rational choice might have been to try some other apps, I decided to just make my own. You can use it here: https://yap.town/ - btw it's totally free and I don't intend on changing that.

It's based on pedagogy principles like spaced repetition and the testing effect. I genuinely think it's probably the most efficient language learning app out there, though it's less polished since I only work on it in my spare time. (And because I haven't tried every language learning app out there, I can't really conclusively make that determination, but I still think it's the best for reasons I'll get into.)

By the way, the frontend is mostly Rust compiled to WASM, which enabled performance optimizations that would've been tough in JavaScript. One other thing: the app is local-first and has cross-device sync based on CRDTs, which I figure should be a hit here. Honestly, that was pretty much as much work as the entire rest of the app combined. The source code is here: https://github.com/anchpop/yap

----

Building this taught me a lot about spaced repetition. The core idea with Yap is that it tests you with sentences that contain words you need to practice. But this gets tricky because words have multiple uses. If you mistranslate a word used one way, practicing it in a different context isn't helpful.

My solution uses NLP (using spaCy) to annotate words with their parts of speech and lemmas, which distinguishes different uses and conjugations of the same word. I also maintain a database of "multi-word terms", because many phrases (such as "a lot") need to be learned as units.

For spaced repetition, the scheduler is FSRS, which is state of the art.

For users with prior language exposure, I automatically adjust difficulty by analyzing word frequency against what they seem to find easy, helping me show them the most common words they don't yet know.

Using the app feels odd at first - after learning just a few words, you can already form sentences like "Why did you do this to me?" These sound complex but use only common words. Unlike Duolingo teaching you "apple" early on, learning the most frequent words first lets you grasp sentence structure immediately, then figure out remaining words from context.

No app is a complete language learning system, this included, but I hope it's a useful supplement to whatever else you're doing to learn a language. One useful supplement to my app is the Pimsleur method, which I have been using as well and having a lot of success with.

----

on Apple platforms, the app requires the latest version of Safari because I use some APIs that were only recently implemented on Apple platforms. Desktop users are always fine with Chrome of course, regardless of platform. I've considered fixing this, but it would kind of be a pain, and because I'm primarily making the app for myself I haven't put too much effort into things that would not benefit me.

david927 3 days ago [ - ]

I like it but it assumes I'm starting the language from scratch. How do I set my level to intermediate?

ChadNauseam 3 days ago [ - ]

If you already know a word, just mark it as "already known". If you already know all the words it's showing you, that will cause the difficulty to ramp up very quickly as it starts skipping ahead to find words you might not know. (If you scroll down to the bottom of the page and open the "graphs" section, you can see the logic behind it.)

The app "only" includes about the 3,000 most common words, so if you're past that level, I don't know how helpful it will be to you. I can easily extend this in the future, I just need bigger corpus with more data.

yorwba 2 days ago [ - ]

The ramp up feels rather slow. For 3000 words, a binary search should take less than 12 steps to find the right level. Maybe it's because you add 5 new words each time, turning those 12 steps into 60 words instead?

Also, I'm confused that you say you would need a bigger corpus for more words, since your readme says that you use the OpenSubtitles data from OPUS. Their 2024 release has tens of millions of sentences for each language, which surely should be enough for tens of thousands of unique words?

ChadNauseam a day ago [ - ]

Thanks for the feedback. I’ve tuned the initial ramp up to be more aggressive and I’ve made it so it adds words in smaller increments at first. Now, after adding 16 words and marking them all as known, it skips to the ~500th most common word. After adding 25, it skips to the ~100th.

What you say about binary search as a good point. I initially used something more like a straightforward binary search, but the issue is that the ramp up is too quick and beginner users would end up adding a bunch of words that were way too advanced for that level. So I tried to make it less aggressive to avoid overshooting, but I guess that has the opposite issue of it taking longer for advanced users. I’ll think about what I can do about that.

For the corpus, I prefer to use Neri’s sentence lists as they’re much higher quality than opensubtitles. You’d be surprised at the problems it has. So I only use opensubtitles for korean (because Neri’s sentence lists doesn’t have a korean version).

Correction: After adding 25, it skips to the ~1000th

irfahm_ 3 days ago [ - ]

its cool, feels low pressure as you can just say you dont know something or can check answers, you can just jump on and it keeps your attention for a long time because of that

yeah! I never thought of that perspective but I'm glad to hear it