One example: I let the agent culminate the essence of all previous discussions into a spec.md file, check it for completeness, and remove all previous context before continuing.
It needs a canonical source of truth, something isolated agents can't provide easily. There are tools out there like specularis that help you do that and keep specs in sync.
...at least until we get real Test-Time Training (TTT) that encodes the state into model weights. If vast amounts of human knowledge can be compressed into ~400GB for frontier models, it's easy to imagine the same for our entire context
strong agree. I always have the LLM put an actual markdown doc in a docs/plans/ folder before starting work. I often, but not always review it.
Aside: it also helps for code review! Review bots can point out the diff between plan and implementation.
Some examples for the curious: https://github.com/sociotechnica-org/symphony-ts/tree/main/d...
[flagged]
It's one of the things that surprised me when I first started using the compound engineering plugin.
I've been considering adding a review gate with a reviewing model solely tasked with identifying gaps between the plan and the implementation.
> file-based state that persists between agent invocations
Can you expand on this with a practical example?
One example: I let the agent culminate the essence of all previous discussions into a spec.md file, check it for completeness, and remove all previous context before continuing.
It needs a canonical source of truth, something isolated agents can't provide easily. There are tools out there like specularis that help you do that and keep specs in sync.
[flagged]
thanks
...at least until we get real Test-Time Training (TTT) that encodes the state into model weights. If vast amounts of human knowledge can be compressed into ~400GB for frontier models, it's easy to imagine the same for our entire context