Sort of... Claude Code writes to a memory.md file that it uses to store important information across conversations. If I review mine it has plenty of details about things like coding convention, structure, and overall architecture of the application it's working on.
The second thing Claude Code does is when it reaches the end of its context window it /compact the session, which takes a summary of the current session, dumps it into a file, and then starts a new session with that summary. But it also retains logs of all the previous sessions that it can use and search through.
Looking over my session of Claude Code, out of the 256k tokens available, about 50k of these tokens are used among "memory" and session summaries, and 200k tokens are available to work with. The reality is that the vast majority of tokens Claude Code uses is for its own internal reasoning as opposed to being "front-end" facing so to speak.
Additionally given that ChatGPT Codex just increased its context length from 256k to 1 million tokens, I expect Anthropic will release an update within a month or so to catch up with their own 1 million token model.
1. The closer the context gets to full the worse it performs.
2. The more context it has the less it weights individual items.
That is Claude might learn you hate long functions and add a line about short functions. When that is the only thing in the function it is likely to follow other very closely. But when it’s 1 piece of such longer context, it is much more likely to ignore it.
3. Tokens cost money even you are currently being subsidized.
4. You have no idea how new models and new system prompt will perform with your current memory.md file.
5. Unlike learning something yourself, anything you teach Claude is likely to start being controlled by your employer. They might not let you take it with you when you go.
Sort of... Claude Code writes to a memory.md file that it uses to store important information across conversations. If I review mine it has plenty of details about things like coding convention, structure, and overall architecture of the application it's working on.
The second thing Claude Code does is when it reaches the end of its context window it /compact the session, which takes a summary of the current session, dumps it into a file, and then starts a new session with that summary. But it also retains logs of all the previous sessions that it can use and search through.
Looking over my session of Claude Code, out of the 256k tokens available, about 50k of these tokens are used among "memory" and session summaries, and 200k tokens are available to work with. The reality is that the vast majority of tokens Claude Code uses is for its own internal reasoning as opposed to being "front-end" facing so to speak.
Additionally given that ChatGPT Codex just increased its context length from 256k to 1 million tokens, I expect Anthropic will release an update within a month or so to catch up with their own 1 million token model.
There’s a few problems with that.
1. The closer the context gets to full the worse it performs.
2. The more context it has the less it weights individual items.
That is Claude might learn you hate long functions and add a line about short functions. When that is the only thing in the function it is likely to follow other very closely. But when it’s 1 piece of such longer context, it is much more likely to ignore it.
3. Tokens cost money even you are currently being subsidized.
4. You have no idea how new models and new system prompt will perform with your current memory.md file.
5. Unlike learning something yourself, anything you teach Claude is likely to start being controlled by your employer. They might not let you take it with you when you go.
> 3. Tokens cost money even you are currently being subsidized.
keep in mind that those 50k memory tokens would likely be cached after the first run and thus significantly cheaper