I do a lot of work classifying and tagging data, at the moment I use a combination of Excel, Supabase, Jupyter Notebooks and recently Teable.io
I spend a lot of time doing what I would describe as an AI VLOOKUP where I get an embedding for every record in Table A and lookup the top matches from Table B, sometimes adding in some LLM synthesising of results. The problem I have is there is always a manual step because the results are never perfect, so I might get AI to classify 10k rows of data and then I’ll go through and sort them and start replacing answers I think are incorrect etc.
It seems to be quite an obvious little niche but I think part of the problem is cost. A lot of the tools which are monthly subscriptions can’t afford the cost of the requests, so they resort to simple labelling. I think some kind of BYO Model might be the way.
Anyone seen or working on such a thing?
chatsheet.com
no embeddings yet, but soon