> How is it reasonable to render the PDF, rasterize it, OCR it, use AI, instead of just using the "quality implementation" to actually get structured data out?
Because the underlying "structured data" is never checked while the visual output is checked by dozens of people.
"Truth" is the stuff that the meatbags call "truth" as seen by their squishy ocular balls--what the computer sees doesn't matter.
Your mistake is in thinking that computers "see the image", second, you somehow think the output of OCR is different from a PDF engine that renders it into structured data/text.