How do you know where to slice an image? What if you slice an image mid-word?

I calculate* the appropriate overlap and the slicer overlaps a certain amount of the previous slice. There is some post-processing assembly required, but it's trivial.

[*] SWAG line height, trial and error to figure out the right amount of overlap given LLM error rates, etc.

Interesting. Do you have a uniform data set? E.g. documents of a specific type that you know consistently have similar formats, or is this training something you need to do per-document?