In the past, they just ran Deepseek OCR on your image and extracted the text, then gave it to a language only model. I believe now there is a model that actually takes images as input directly.
In the past, they just ran Deepseek OCR on your image and extracted the text, then gave it to a language only model. I believe now there is a model that actually takes images as input directly.