Does SQLite not have a lemon parser generated for its SQL?

When I ported pikchr (also from the SQLite project) to Go, I first ported lemon, then the grammar, then supporting code.

I always meant to do the same for its SQL parser, but pikchr grammar is orders of magnitude simpler.

When I refer to "extracting sources from the SQLite codebase" a big part of that was indeed referring to compiling Lemon and executing it against a custom implementation of parse.y [1].

The problem comes from how SQLite's upstream parse.y works. Becuase it doesn't actually generate the parse tree, instead generating the bytecode directly, the intepretation of any node labelled "id" or "nm" is buried inside the source code behind many layers of functions. You can see for yourself by looking at SQLite's parse.y [2]

[1] https://github.com/LalitMaganti/syntaqlite/tree/main/syntaql... [2] https://sqlite.org/src/file?name=src/parse.y&ci=trunk

Ah, that makes sense. Thanks for the details. I see now that your article basically had all the information I needed to figure this out if I’d thought a bit harder!

Also, nice work: this makes the world just a little nicer!

Correct[0]. This was also my first thought after reading

> Unfortunately, unlike many other languages, SQLite has no formal specification describing how it should be parsed. It doesn’t expose a stable API for its parser either. In fact, quite uniquely, in its implementation it doesn’t even build a parse tree at all9! The only reasonable approach left in my opinion is to carefully extract the relevant parts of SQLite’s source code and adapt it to build the parser I wanted

Did they made a proper problem research in the first place?

[0]: https://sqlite.org/lemon.html

I was also baffled. "No formal specification"? Two minutes of browsing is enough to find it: https://github.com/sqlite/sqlite/blob/master/src%2Fparse.y

I'm very well aware of parse.y, if you look into the syntaqlite code, you'd find it's a critical part of how the whole "source extraction" mentioned in the article works [1]

To be clear when I say "formal specification", I'm not just talking about the formal grammar rules but also how those interpreted in practice. Something closer to the ECMAScript specification (https://ecma-international.org/publications-and-standards/st...).

[1] https://github.com/LalitMaganti/syntaqlite/blob/93638c68f9a0...