The base model is Qwen2.5-VL-3B and the announcement says a limitation is "Model can suffer from hallucination"

Seems a bit scary that the "source" text from the pdfs could actually be hallucinated.

Given that input is image and not raw pdf, its not completely unexpected