Very, very nice.

Since January, I’ve been having Claude build a static Japanese-English dictionary in which all of the kanji and jukugo can be displayed either with or without furigana:

https://www.tkgje.jp/index.html

I haven’t spotted any mistakes in the furigana myself, though there must be some. I have a scheduled routine running multiple times a day to have Claude check and polish existing entries; it should be correcting most of whatever furigana mistakes might be in the data. At some point, I will set up an agent to use a different LLM to run a similar set of checks to try to reduce the error rate even more.

As you note, the readings of Japanese words depend on the context, so producing accurate furigana cannot be done entirely programmatically. Sentences must be interpreted semantically.

I am releasing all of the dictionary data into the public domain, and anyone is free to fork it or adapt it however they like:

https://github.com/tkgally/je-dict-1

Thanks for sharing this. It looks like a really cool project, and making the data public domain is especially generous.

I especially like the dictionary + example sentence format. I haven’t found a really good Japanese-English dictionary for learners, and yours looks promising.

I’m curious how token-intensive the repeated Claude polishing runs are.

Quite token-intensive. I pay for the Max plan, and the regular dictionary runs consume maybe 20 percent of my weekly quota.

[deleted]