Hacker News

Does it hallucinate with the LLM being used?

Sometimes. I just fed the huggingface demo an image containing some rather improbable details [1] and it OCRed "Page 1000000000000" with one extra trailing zero.

Honestly I was expecting the opposite - a repetition penalty to kick in having repeated zero too many times, resulting in too few zeros - but apparently not. So you might want to steer clear of this model if your document has a trillion pages.

Other than that, it did a solid job - I've certainly seen worse attempts to OCR a table.

[1] https://imgur.com/a/8rJeHf8

nattaylor 16 days ago [ - ]

The base model is Qwen2.5-VL-3B and the announcement says a limitation is "Model can suffer from hallucination"

gibsonf1 16 days ago [ - ]

Seems a bit scary that the "source" text from the pdfs could actually be hallucinated.

prats226 15 days ago [ - ]

Given that input is image and not raw pdf, its not completely unexpected