What about deterministic parsing?
Basically using templates to extract info from recurring doc structures ??