Hacker News

I think your 3k figure comes from here - It is explained:

> As judges, the professors then completed 2,918 blinded, forced-choice comparisons (median per judge: 200), each time indicating which of the two anonymized responses, from the instructor or the LLM, they would rather give to a student

IshKebab 11 hours ago [ - ]

So did were the answers fact checked? If not that seems like a pretty obvious flaw!

epolanski 7 hours ago [ - ]

The study deliberately analyzes questions that don't have clear black or white answers, what matters is the reasoning.