Hacker News

Our system uses sub-agents as a core part of its architecture.

That terminology can be confusing, because in other cases (and sometimes in our own architecture, like when executing thousands of operations via MAP) a sub-agent may be a smaller model given less complex individual tasks.

But the core mechanism we use for simulating unlimited context is to allow the main model to spin up instances of itself (sub-agents) with the previously summarized portion of the context expanded into its full, uncompressed state.

Expanding summaries into full text in sub-agents rather than the main thread is a critical part of our architecture, because it prevents the main context window from filling up.