The graph building agent processes the raw files (like emails) in a batch. It gets two things: a lightweight index of the entire knowledge graph, and the raw source files for the current batch being processed.

Before each batch, we rebuild an index of all existing entities (people, orgs, projects, topics) including aliases and key metadata. That index plus the batch’s raw content goes into the prompt. The agent also has tool access to read full notes or search for entity mentions in existing knowledge if it needs more detail than what’s in the index.

It’s effectively multi-pass: we process in batches and rebuild the index between batches, so later batches see entities created earlier. That keeps context manageable while still letting the graph converge over time.