Wow, half of this article deeply resonates with what I am working on.

Text-to-SQL is the funniest example. It seems to be the "hello world" of agentic use in enterprise environments. It looks so easy, so clear, so straight-forward. But just because the concept is easy to grasp (LLMs are great at generating markup or code, so let's have them translate natural language to SQL) doesn't mean it is easy to get right.

I have spent the past 3 months building a solution that actually bridges the stochastic nature of AI agents and the need for deterministic queries. And boy oh boy is that rabbit hole deep.

SQL is never just the tables and joins, it’s knowing the table grains, the caveats, all the modelling definitions and errors (and your data warehouse almost certainly has modelling errors as business logic in your app drifts), plus the business context to correctly answer questions.

60% of the time I spend writing sql is probably validation. A single hallucinated assumption can blow the whole query. And there are questions that don’t have clear modelling approaches that you have to deal with.

Plus, a lot of the sql training data in LLMs is pretty bad, so I’ve not been impressed yet. Certainly not to let business users run an AI query agent unchecked.

I’m sure AI will get good at this, so I’m building up my warehouse knowledge base and putting together documentation as best I can. It’s just pretty awful today.

Text2SQL was 75% on bird-bench 6 months ago. Now it's 80%. Humans are still at 90+%. We're not quite there yet. I suspect text-to-sql needs a lot of intermediate state and composition of abstractions, which vanilla attention is not great at.

https://bird-bench.github.io/

Text to sql is solved by having good UX and a reasonable team that’s in touch with the customers needs.

A user having to come up with novel queries all the time to warrant text 2 sql is a failure of product design.

This 1000x. I’ve sat through several vendor demos of BI tools that have a chatbot and seen my PM go all starry eyed that you can ask it “show me top x over the last week” and get a chart back. How an empty text box is easier to use than a UI with several filter drop-downs, I’ll never understand, and I suspect that the people impressed with this stuff don’t know either.

This is exactly it. AI is sniffing out the good datamodels from the bad. Easy to understand? AI can understand it too! Complex business mess with endless technical debt? Not too much.

But this is precisely why we're seeing startups build insane things fast while well established companies are still questioning if it's even worth it or not.

There were some iffy things about the text to SQL datasets though, historically.

People got good results on the test datasets, but the test datasets had errors so the high performance was actually just the models being overfitted.

I don't remember where this was identified, but it's really recent, but before GPT-5.

[deleted]

> building a solution that actually bridges the stochastic nature of AI agents and the need for deterministic queries

Wait but this just sounds unhinged, why oh why

The problem is that precision is expensive. Writing is thinking. Writing software is defining the business problem.

People don't know exactly what they want from the data warehouse, just a fuzzy approximation of it. You need stochastic software (AI) to map the imprecise instructions from your users to precise instructions the warehouse can handle.

glad it resonates, that was the intention