This is such a basic thing nowadays, and ElasticSearch is massive overkill for it. Something like SQLite or LanceDB or basically any vector database is much more appropriate.
This seems to be coming from the “we must make ElasticSearch AI-compatible” department more than anything.
If you already have Elasticsearch, it makes sense to continue utilizing it.
Saying, “just use SQLite” completely dismisses the idea that this is a _shared_ memory across teams. The ability to easily connect to the remote service and have everything “just work” pays dividends when you have dozens or hundreds of users.
I’m literally laughing at the root comment’s idea of proposing we replace ES with SQLite and imagining how that architecture review would go. Not everyone is doing MB/GB scale workloads.
that would be a pretty frail architecture too, I think I recall ES even saying not to rely on it for data persistence. Every time I've worked with ES it was always backed by some other database used as a source of truth.
This is a important bit of information which either gets lost or ignored for convenience at times. Other side of it is the fact that this open up the door to keep two data stores in sync which is a much bigger battle for a lot of small companies or teams
ElasticSearch is fine. If your dataset isn't too big you aren't going to hit shard and memory limits and if you do chances are you are already in a large enough organisation that you'll have the manpower to do the required maintenance. It's not rocket science.
> This seems to be coming from the “we must make ElasticSearch AI-compatible” department more than anything.
I don't see the problem in that. It'd be great to have agentic capabilities embedded into Kibana and ES as long as it's not user hostile.
Yeah, it's like this Dropbox service they made a big deal about, when one could just make one of its own with rsync and some bash scripting.
Nah, "Any other vector DB" starts to fall apart once you need stuff like scripted scoring like OP uses. Then it starts to be a question of, "do you need ANN for performance?" since SQLite only does brute-force vector scoring. And granted, brute-force is performant for far more vectors than most people give it credit for, but it definitely hits a wall well below 1 million if you want it to have webpage-type latency.
Maintaining Elasticsearch isn't free, but picking an underpowered db and having to port to the right one is also quite time consuming.
it's also an odd situation to say a tabular database can replace a document store .. sure, it can, but that's not good practice from my point of view
also, I've run ES on an old laptop and it worked really well, so the cost of it can be pretty low if you're still in development
I agree for casual usage, but this seems targeted towards enterprise setups, which makes much more sense to use something like ElasticSearch if you're already in the Amazon cloud, and especially if using the advanced features it provides like they are.
The design they talk about includes 3 different types of memory. They store those kinds of memory separately, so that if there's 10 users, all 10 access memories that are more general ("what bulbs work with this kind of light fixture"), and user-specific memories are segregated ("sarah has three lightbulbs"). The different memory types are ranked together leading to a different result. So this is a novel design and use of ElasticSearch-specific features
Would be interesting if one can replace ElasticSearch with something like Typesense here
[dead]