All these agent memory systems seem so simultaneously over and under engineered and like a certain dead end. I cannot imagine any reality in which this does not rot and get out of sync with what the latest model need. For the one time you build a payment provider how many session will be tilted towards thinking about payments because of the "don't use stripe" memory?
I've treated this as an information problem and wrote a small utility that explicitly does not store most things (https://github.com/skorokithakis/gnosis). Basically, the premise is that the things the LLM knows will always be there, so store nothing the LLM said, the code will always be there so the code-relevant things should be comments, but there are things that will be neither, and that are never captured.
When we create anything, what we ended up not doing is often more important than what we did end up doing. My utility runs at the end of the session and captures all the alternatives we rejected, and the associated rationales, and stores that as system knowledge.
Basically, I want to capture all these things that my coworkers know, but that I can't just grep the code for. So far it's worked well, but it's still early.