I wonder how this compares to a more agentic approach where the LLM composes SQL queries to answer the questions, for example.
I wonder how this compares to a more agentic approach where the LLM composes SQL queries to answer the questions, for example.
Yeah I mean for many real world scale datasets you don’t want to blow the whole context window on a massive markdown file. Instead you can provide a tool that presents the data as a SQLite database. In my testing Claude code seems very capable of answering questions via SQLite queries or even `head` and `grep` on CSV files.
But the result from the SQL query is going to be... a table. So at some point, tables need to go into context, and we need to know how well LLMs can incorporate those tables.
This was exactly my thought. Rather than feed the table directly to the LLM, build agents that extract the data and have the LLM act on the extracted data items. Then it’s a preference issue.
The author didn’t see much more than 60% accuracy which is not very useful for many (most?) real world tasks.
“Agents that extract the data” Are we really reinventing data frame readers to have an LLM in the critical path?
Reinventing? No. Using? Yes, for a lot of good reasons.
LLMs are expensive. Spending tokens to do something in bulk that is well suited to existing tools and algorithms, is wasteful and slow. And the main reason is that, using LLMs, the original author indicated only a 60% success rate for the task. Why spend many times more time and money and energy just to use an LLM on a well-understood preparatory task that it sucks at, when you can get much better results more inexpensively with off-the-shelf tools, and feed their results to the LLM for its unique value.
Well, ironically you then have the issue of how to present your database schema (including important things like the values in some categorical fields) to the LLM and in what format, so you never really escape this issue.