Hacker News

I just want to add that treesitter is a heuristic, incremental parser.

The difference between regular parsers and treesitter, is that regular parsers start eating tokens from the start of the file, and try to assemble and AST from that. The AST is built from the top down.

Treesitter works differently, it grabs tokens from an arbitrary point, and assembles them into AST nodes, then tries to extend the AST until the whole file is parsed.

This method supports incremental edits (as you can throw away the AST for the modified part, and try to re-parse), but the problem is that most languages are designed to be unambiguous when parsed left to right, and parsing them like this might involve some retries and guesswork.

Also, unlike modern languages, like Go, which is designed to be parseable without any semantic analysis, a lot of older languages don't have this property, notably C/C++ needs a symbol table. In this case, treesitter has to guess, and it might guess wrong.

As for what can you do with an AST and what can't you: you can tell if something is a function call, a variable reference, or any other piece of syntax, but if you write something like x = 2; then tree-sitter has no idea what x is, is it a float, an int? is it a local, a class variable, or a global? You can tell this with a symbol table which the compiler uses to dereference symbols, but treesitter cant do this for you.