I've been wondering along similar lines, although I am for all intents and purposes here a layman so apologies if the following is nonsensical.
I feel there are potential parallels between RAG and how human memory works. When we humans are prompted, I suspect we engage in some sort of relevant memory retrieval process and the retrieved memories are packaged up and factored in to our mental processing triggered by the prompt. This seems similar to RAG, where my understanding is that some sort of semantic search is conducted over a database of embeddings (essentially, "relevant memories") and then shoved into the prompt as additional context. Bigger context window allows for more "memories" to contextualise/inform the model's answer.
I've been wondering three things: (1) are previous user prompts and model answers also converted to embeddings and stored in the embedding database, as new "memories", essentially making the model "smarter" as it accumulates more "experiences" (2) could these "memories" be stored alongside a salience score of some kind that increases the chance of retrieval (with the salience score probably some composite of recency and perhaps degree of positive feedback from the original user?) (3) could you take these new "memories" and use them to incrementally retrain the model for, say, 8 hours every night? :)
Edit: And if you did (3), would that mean even with a temperature set at 0 the model might output one response to a prompt today, and a different response to an identical prompt tomorrow, due to the additional "experience" it has accumulated?