This thread is considering image input as an alternative to text input for text, not as an alternative to other types of OCR, so the accuracy bar is 100%.
I've had mixed results with LLMs for OCR.. sometimes excellent (zero errors on a photo of my credit card bill), but poor if the source wasn't a printed page - sometimes "reusing" the same image section for multiple extracted words!
FWIW, I highly doubt that LLMs have just learnt to scan pages from (page image, page text) training pairs - more likely text-heavy image input is triggering special OCR handling.