I found a big problem with this - I noticed that the longest answer is very often the correct one, which kinda ruined the game. Even though I didn't want it to, it started affecting my decision-making. Luckily, I only noticed this around question 85, though those are really the tricky ones.
Good news for the project is that I think you can easily tweak the LLM to generate better alternatives.
I got 89/100, which extrapolates to 72,700. As a non-native speaker, I'm quite happy with that.
Yeah, it happened to me too. When I notice the pattern, I go right away for the longest one, and the answer was 90% correct!