there has been so many open source OCR in the last 3 months that would be good to compare to those especially when some are not even 1B params and can be run on edge devices.
- paddleOCR-VL
- olmOCR-2
- chandra
- dots.ocr
I kind of miss there is not many leaderboard sections or arena for OCR and CV and providers hosting those. Neglected on both Artificial Analysis and OpenRouter.
Someone posted a project here about a month ago where they compare models in head-to-head matchups similar to llmarena
https://www.ocrarena.ai/leaderboard
Hasn't been updated for Mistral but so far gemeni seems to top the leaderboard.
OCR developers from decades past must be slapping their foreheads now that it seems users will wait a whole minute per page and be happy.
What they are happy about is accurate OCR.
Getting the wrong answer really quickly is not the best goal.
You can also sort by latency. dots.ocr has the lowest at 3.8s/page. And although it doesn't fare very well against much larger slower models, it's still streets ahead of traditional OCR techniques
How can something have a very high ELO but a very low win rate?
You don't loose any elo if your opponent is much stronger than you. Remis could in theory play a part as well.
very nice comparison! I'd like to see on what examples OCR engines fail
what I like in MistralOCR is that they have simple pricing $1/1k pages and API hosted on their servers. With other OCR is hard to compare pricing because are token based and you don't know how many tokens is the image unless you run your own test.
E.g. with Gemini 3.0 flash you might seem that model pricing increased only slightly comparing to Gemini 2.5 flash until you test it and will see that what used to be 258 per 384x384 input tokens now is around 3x more.
But they doubled the price g for this new mistralocr3 model to 2$
Simple would be to bill per character.
Now I have to figure out how large a page can be.
I spent like three hours trying to get one of these running and then gave up. I think the paddleOCR one.
It took an hour and a half to install 12 gigabytes of pytorch dependencies that can't even run on my device, and then it told me it had some sort of versioning conflict. (I think I was supposed to use UV, but I had run out of steam by that point.)
Maybe I should have asked Claude to install it for me. I gave Claude root on a $3 VPS, and it seems to enjoy the sysadmin stuff a lot more than I do...
Incidentally I had a similar experience installing open web UI... It installed 12 GB of pytorch crap.. I rage quit and deleted the whole thing, and replicated the functionality I actually needed in 100 lines of HTML.... Too bad I can't do that with OCR ;)
gemini-cli is good for this sort of thing. You can just tell it "Find out why xyz.py doesn't run" and let it crunch. It will try reasonably hard to get you out of Python dependency hell, and (more important) it generally knows when to give up.
But yes, in general, you want to use uv. Otherwise, the next Python application you install WILL break the last one you installed.
I suppose you could use gemini-cli as a substitute for proper Python virtual environment management, always letting it fix whatever broke since the last time you tried to run the program, but that'd be like burning down a rainforest to toast a marshmallow.
Actually, I just remembered, this was inside uv!
https://www.codesota.com/ocr
[dead]