Hacker News

Re 1 - that is a very kind offer! Our current public template library is very limited, so let me come back to you on this.

2. We see exactly the same thing. There is a trade-off in correctness vs token burning. However, some tokens (models) are cheaper and faster than others, so the small pieces can benefit from that. The token usage is also surprisingly variable, because it depends on the information density of the document and also on the information density of the question (e.g. is it a single needle in a haystack or are we analyzing the entire haystack from 10 perspectives). So the parsing for 1k pages may be on the order of millions of tokens, while a series of queries (extractions) on top of it could be 1-2 orders of magnitude more.