ELO scores for OCR don't really make much sense - it's trying to reduce accuracy to a single voting score without any real quality-control on the reviewer/judge.
I think a more accurate reflection of the current state of comparisons would be a real-world benchmark with messy/complex docs across industries, languages.