1. Incredible! Can I make an unsolicited ask? If you had industry specific templates for standardized PDFs it would be easier for me to send Parsewise to the insurance companies I'd worked for. Something similar to https://www.useanvil.com/forms/?type=pdf-templates but with your clean, semantic data model.

2. Can I ask how? When I was building something like this, I realized there's an element of burning tokens for correctness. Meaning, splitting things into small units and small processes, each using a separate LLM output to be later combined. For a 1k page document, what kind of token usage do you see?

Re 1 - that is a very kind offer! Our current public template library is very limited, so let me come back to you on this.

2. We see exactly the same thing. There is a trade-off in correctness vs token burning. However, some tokens (models) are cheaper and faster than others, so the small pieces can benefit from that. The token usage is also surprisingly variable, because it depends on the information density of the document and also on the information density of the question (e.g. is it a single needle in a haystack or are we analyzing the entire haystack from 10 perspectives). So the parsing for 1k pages may be on the order of millions of tokens, while a series of queries (extractions) on top of it could be 1-2 orders of magnitude more.