Sometimes. I just fed the huggingface demo an image containing some rather improbable details [1] and it OCRed "Page 1000000000000" with one extra trailing zero.
Honestly I was expecting the opposite - a repetition penalty to kick in having repeated zero too many times, resulting in too few zeros - but apparently not. So you might want to steer clear of this model if your document has a trillion pages.
Other than that, it did a solid job - I've certainly seen worse attempts to OCR a table.
Sometimes. I just fed the huggingface demo an image containing some rather improbable details [1] and it OCRed "Page 1000000000000" with one extra trailing zero.
Honestly I was expecting the opposite - a repetition penalty to kick in having repeated zero too many times, resulting in too few zeros - but apparently not. So you might want to steer clear of this model if your document has a trillion pages.
Other than that, it did a solid job - I've certainly seen worse attempts to OCR a table.
[1] https://imgur.com/a/8rJeHf8
The base model is Qwen2.5-VL-3B and the announcement says a limitation is "Model can suffer from hallucination"
Seems a bit scary that the "source" text from the pdfs could actually be hallucinated.
Given that input is image and not raw pdf, its not completely unexpected