I've seen a lot of such systems come and go. One of my friends is working on probably the best (VC-funded) memory system right now.

The problem always is that when there are too many memories, the context gets overloaded and the AI starts ignoring the system prompt.

Definitely not a solved problem, and there need to be benchmarks to evaluate these solutions. Benchmarks themselves can be easily gamed and not universally applicable.

The armchair ML engineer in me says our current context management approach is the issue. With a proper memory management system wired up to it’s own LLM-driven orchestrator, memories should be pulled in and pushed out between prompts, and ideally, in the middle of a “thinking” cycle. You can enhance this to be performant using vector databases and such but the core principle remains the same and is oft repeated by parents across the world: “Clean up your toys before you pull a new one out!”

Also since I thought for another 30 seconds, the “too many memories!” Problem imo is the same problem as context management and compaction and requires the same approach: more AI telling AI what AI should be thinking about. De-rank “memories” in the context manager as irrelevant and don’t pass them to the outer context. If a memory is de-ranked often and not used enough it gets purged.

Fair concern.

ReadMe does support loading memories mid-reasoning! It is simply an agent reading files.

Although GPT-5.4 currently likes to explore a lot upfront, and only then responds. But that is more of a model behaviour (adjustable through prompting) rather than an architectural limitation.

Ah, I mean bi-directional management of context. Add and remove. Basically just the remove bit since we have adding down.

I see your point.

A removal mechanism is not (yet) implemented. But in principle, we could adjust the instructions in Update.md so that it does a minor "refactor" of the filesystem each day, then newer abstractions can form, while irrelevant gets pruned/edited. That's the beauty of the architecture, you define how the update can occur!

But if you do have a new memory (possibly contradicting an old one), is it really a good idea to prune/edit it?

If you are genuinely uncertain between choice A and B, then having them both exist in the memory archive might be a feature. The agent gets the possibility of seeing contradictory evidence on different dates, which communicates indecisiveness.

Do you remember the day you learned how to perform long division?

The purpose of memory pruning is not to “forget” useful or even contradictory information, but to condense it so that the useful bits of the memory take less context and be more immediately accessible in situations that need it.

I don't remember such details, but as you suggest, it is a healthy kind of compression.

I address it through merging the lower-level memories into more abstracted ones through a temporal hierarchical filesystem. So, days -> months -> quarters -> years. Each time scale focuses on a more "useful" context since uncertain/contradictory information does not survive as it goes up in abstraction.

For example, A day-level memory might be: "The user learned how to divide 314 by 5 with long division on Jan 3rd 2017."

A year-level memory might be: "The user progressed significantly in mathematics during elementary school."

From the perspective of the LLM, it is easier to access the year-level memories because it requires fewer "cd" commands, and it only dives down into lower levels when necessary.

Mid thinking cycle seems dangerous as it will probably kill caching.

The mid thinking cycle would require significant architecture change to current state of art and imo is a key blocker to AGI

Context bloat is real, but the architecture has the potential to solve it.

You need clever naming for the filesystem and exploration policy in AGENTS.md. (not trivial!)

The benchmark is definitely the core bottleneck. I don't know any good benchmark for this, probably an open research question in itself.

What is the memory system you are referring to? I've been trying Memori with OpenClaw. Haven't had a ton of time to really kick the tires on it, so the jury's still out.