[flagged]

strong agree. I always have the LLM put an actual markdown doc in a docs/plans/ folder before starting work. I often, but not always review it.

Aside: it also helps for code review! Review bots can point out the diff between plan and implementation.

Some examples for the curious: https://github.com/sociotechnica-org/symphony-ts/tree/main/d...

[flagged]

It's one of the things that surprised me when I first started using the compound engineering plugin.

I've been considering adding a review gate with a reviewing model solely tasked with identifying gaps between the plan and the implementation.

> file-based state that persists between agent invocations

Can you expand on this with a practical example?

One example: I let the agent culminate the essence of all previous discussions into a spec.md file, check it for completeness, and remove all previous context before continuing.

It needs a canonical source of truth, something isolated agents can't provide easily. There are tools out there like specularis that help you do that and keep specs in sync.

[flagged]

thanks

...at least until we get real Test-Time Training (TTT) that encodes the state into model weights. If vast amounts of human knowledge can be compressed into ~400GB for frontier models, it's easy to imagine the same for our entire context