Extracting text from DOCX is easy. Anything related to layout is non-trivial and extremely brittle.

To get the layout correct, you need to reverse engineer details down to Word's numerical accuracy so that content appears at the correct position in more complex cases. People like creating brittle documents where a pixel of difference can break the layout and cause content to misalign and appear on separate pages.

This will be a major problem for cases like the text saying "look at the above picture" but the picture was not anchored properly and floated to the next page due to rendering differences compared to a specific version of Word.