I've found the same thing. I build with Claude Code daily and the context decay is real by the end of a long session it starts forgetting decisions we made earlier. The 1M context window should help but I'm curious how coherence holds up at that scale.

What's been working for me is keeping a CLAUDE.md file in my project root with key decisions and context. The model reads it at the start of every session so I don't have to re-explain everything. Not as elegant as automated compaction but it works.

> I build with Claude Code daily and the context decay is real by the end of a long session it starts forgetting decisions we made earlier

I generate task.md files before working on anything, some are short, others are super long and with many steps. The models don't deviate anymore. One trick is to make a post tool use hook to show the first open gate "- [ ]" line from task.md on each tool call. This keeps the agent straight for 100s of gates.

After each gate is executed we don't just check it, we also append a few words of feedback. This makes the task.md become a workbook, covering intent, plan, execution and even judgements. I see it like a programming language now. I can gate any task and the agent will do it, however many steps. It can even generate new gates, or replan itself midway.

You can enforce strict testing policies by just leaning into gate programability power - after each work gate have a test gate, and have judges review testing quality and propose more tests.

The task.md file is like a script or pipeline. It is also like a first class function, it can even ingest other task.md files for regular reflexion. A gate can create or modify gates, or tasks. A task can create or modify gates or tasks.

[deleted]