Especially for indie users/devs and smaller teams. I built a part of this(the retriever) in < 4 hours https://github.com/itissid/wiki for replacing deepwiki.
I think the challenge is to teach how ranking works to people more effectively so that they can build it for themselves and host them on their own.
Like the other day someone who has worked in search explained to me why you would care about using learning-to-rank(LTR) technique to train your own feature vector weights on your data. My understanding is that weighted features work better(retreival wise) on textual data than plain BM-25 and vector embedding db indexing of text chunks of your data with minimal preprocessing. So if you have lots of conversations you can create a ton of features(like attributes of a conversation) from it and ones that matter more will rank higher. And you can use a regularization(like L1) to kill unimportant ones.
[EDIT]: IIUC, I think LTR is important because you likely want different features to matter more for different parts of your documents, e.g. what matters for codebase documentation is different from your personal journal.
Yeah, after I tokenize them and embed them into vector form. Then it’s a simple cosine distance.
The point about memory is sometimes you remember great detail, sometimes you only remember that the memory exists, so having a good tool loop to attempt to recall and try permutations is good.
Especially for indie users/devs and smaller teams. I built a part of this(the retriever) in < 4 hours https://github.com/itissid/wiki for replacing deepwiki.
I think the challenge is to teach how ranking works to people more effectively so that they can build it for themselves and host them on their own.
Like the other day someone who has worked in search explained to me why you would care about using learning-to-rank(LTR) technique to train your own feature vector weights on your data. My understanding is that weighted features work better(retreival wise) on textual data than plain BM-25 and vector embedding db indexing of text chunks of your data with minimal preprocessing. So if you have lots of conversations you can create a ton of features(like attributes of a conversation) from it and ones that matter more will rank higher. And you can use a regularization(like L1) to kill unimportant ones.
[EDIT]: IIUC, I think LTR is important because you likely want different features to matter more for different parts of your documents, e.g. what matters for codebase documentation is different from your personal journal.
I don't treat memory like RAG. That's the key. I only track decisions, actions, and outcomes.
Ah so you extract decisions, actions and outcomes and you index and search over them?
Yeah, after I tokenize them and embed them into vector form. Then it’s a simple cosine distance.
The point about memory is sometimes you remember great detail, sometimes you only remember that the memory exists, so having a good tool loop to attempt to recall and try permutations is good.