Interesting approach! One question though: can the model do column detection?
The first OCR example returns output that does not detect the article columns - the bounding box is the entire first line.
Interesting approach! One question though: can the model do column detection?
The first OCR example returns output that does not detect the article columns - the bounding box is the entire first line.
It can, you could try prompting the model to use object detection vision and text extraction, we realized when we purely extract text it does amazing at word/sentence level bounds since the text acts as the anchor. However, when you treat it as a object detection problem, it sees that chunk of text as a segment allowing you the extract it as one column bound. Give that a try.