We do both:
We compress tool outputs at each step, so the cache isn't broken during the run. Once we hit the 85% context-window limit, we preemptively trigger a summarization step and load that when the context-window fills up.
We do both:
We compress tool outputs at each step, so the cache isn't broken during the run. Once we hit the 85% context-window limit, we preemptively trigger a summarization step and load that when the context-window fills up.
> we preemptively trigger a summarization step and load that when the context-window fills up.
How does this differ from auto compact? Also, how do you prove that yours is better than using auto compact?