I agree the current way tools are used seems inefficient. However there are some very good reasons they tend to operate on code instead of syntax trees:

* Way way way more code in the training set.

* Code is almost always a more concise representation.

There has been work in the past training graph neural networks or transformers that get AST edge information. It seems like some sort of breakthrough (and tons of $) would be needed for those approaches to have any chance of surpassing leading LLMs.

Experimentally having agents use ast-grep seems to work pretty well. So, still representing a everything as code, but using a syntax aware search replace tool.

Didn't want to bury the lead, but I've done a bunch of work with this myself. It goes fine as long as you give it both the textual representation and the ability to walk along the AST. You give it the raw source code, and then also give it the ability to ask a language server to move a cursor that walks along the AST, and then every time it makes a change you update the cursor location accordingly. You basically have a cursor in the text and a cursor in the AST and you keep them in sync so the LLM can't mess it up. If I ever have time I'll release something but right now just experimenting locally with it for my rust stuff

On the topic of LLMs understanding ASTs, they are also quite good at this. I've done a bunch of applications where you tell an LLM a novel grammar it's never seen before _in the system prompt_ and that plus a few translation examples is usually all it takes for it to learn fairly complex grammars. Combine that with a feedback loop between the LLM and a compiler for the grammar where you don't let it produce invalid sentences and when it does you just feed it back the compiler error, and you get a pretty robust system that can translate user input into valid sentences in an arbitrary grammar.

Sounds like cool stuff, along the lines of structure editing!

The question is not whether it can work, but whether it works better than an edit tool using textual search/replace blocks. I'm curious what you see as the advantage of this approach? One thing that comes to mind is that having a cursor provides some natural integration with LSP signature help

Yes agentic loop with diagnostic feedback is quite powerful. I'd love to have more controllable structured decode from the big llm providers to skip some sources of needing to loop - something like https://github.com/microsoft/aici

I’d love to see how you’re giving this interface to the LLM

> * Way way way more code in the training set.

Why not convert the training code to AST?

You could, but it is extremely expensive to train an LLM that is competitive on coding evals. So, I was assuming use of a model someone else trained.

Also, if it is only trained on code, it's likely to miss out on all the world knowledge that comes from the rest of the data.

fine tune instead of training from scratch might help.