> To do it, we completely abandon the PDF text-structure and only use the individual location of each letter. Then we combine letter positions to words, words to lines, and lines to text-blocks using a number of algorithms. We use the structure blocks that we generated with machine learning afterwards, so this is just the first step in analyzing the page.
Do you happen to have any sources for learning more about the piecing together process? E.g. the overal process and the algorithms involved etc. It sounds like an interesting problem to solve.