Yeah I quite agree with this take. I don't understand why editors aren't utilizing language servers more for making changes. Crazy to see agents running grep and sed and awk and stuff, all of that should be provided through a very efficient cursor-based interface by the editor itself.
And for most languages, they shouldn't even be operating on strings, they should be operating on token streams and ASTs
Strings are a universal interface with no dependencies. You can do anything in any language across any number of files. Any other abstraction heavily restricts what you can accomplish.
Also, LLMs aren't trained on ASTs, they're trained on strings -- just like programmers.
No, it’s not really “any string.” Most strings sent to an interpreter will result in a syntax error. Many Unix commands will report an error if you pass in an unknown flag.
In theory, there is a type that describes what will parse, but it’s implicit.
Exactly. LLMs are trained on huge amounts of bash scripts. They “know” how to use grep/awk/whatever. ASTs are, I assume, not really part of that training data. How would they know how to work well with on? LLMs are trained on what humans do to code. Yes, I assume down the road someone will train more efficient versions that can work more closely with the machine. But LLMs work as well as they do because they have a large body of “sed” statements in their statistical models
They also know how to use modern options like fd and rg, which allow more complex operations with a single call.
treesitter is more or less a universal AST parser you can run queries against. Writing queries against an AST that you incrementally rebuild is massively more powerful and precise in generating the correct context than manually writing infinitely many shell pipeline oneliners and correctly handling all of the edge cases.
I agree with you, but the question is more whether existing LLMs have enough training with AST queries to be more effective with that approach. It’s not like LLMs were designed to be precise in the first place
generating code that doesn't run is just a waste of electricity.
It's so weird that codex/claude code will manually read through sometimes dozens of files in a project because they have no easy way to ask the editor to "Find Usages".
Even though efficient use of CLI tools might make the token burn not too bad, the models will still need to spent extra effort thinking about references in comments, readmes, and method overloading.
We have that in Scala with the MCP tools metals provides but convincing Claude to actually use the tools has been really painful.
https://scalameta.org/metals/blog/2025/05/13/strontium/#mcp-...
Which is why I wrote a code extractor MCP which uses Tree-sitter -- surely something that directly connects MCP with LSP would be better but the one bridge layer I found for that seemed unmaintained. I don't love my implementation which is why I'm not linking to it.
both opencode and charm's crush support LSP's and MCP's as configs
Also, the business models are incentivized towards efficient token usage.
Really? Github Copilot Agent can search. Interesting.
I agree the current way tools are used seems inefficient. However there are some very good reasons they tend to operate on code instead of syntax trees:
* Way way way more code in the training set.
* Code is almost always a more concise representation.
There has been work in the past training graph neural networks or transformers that get AST edge information. It seems like some sort of breakthrough (and tons of $) would be needed for those approaches to have any chance of surpassing leading LLMs.
Experimentally having agents use ast-grep seems to work pretty well. So, still representing a everything as code, but using a syntax aware search replace tool.
Didn't want to bury the lead, but I've done a bunch of work with this myself. It goes fine as long as you give it both the textual representation and the ability to walk along the AST. You give it the raw source code, and then also give it the ability to ask a language server to move a cursor that walks along the AST, and then every time it makes a change you update the cursor location accordingly. You basically have a cursor in the text and a cursor in the AST and you keep them in sync so the LLM can't mess it up. If I ever have time I'll release something but right now just experimenting locally with it for my rust stuff
On the topic of LLMs understanding ASTs, they are also quite good at this. I've done a bunch of applications where you tell an LLM a novel grammar it's never seen before _in the system prompt_ and that plus a few translation examples is usually all it takes for it to learn fairly complex grammars. Combine that with a feedback loop between the LLM and a compiler for the grammar where you don't let it produce invalid sentences and when it does you just feed it back the compiler error, and you get a pretty robust system that can translate user input into valid sentences in an arbitrary grammar.
Sounds like cool stuff, along the lines of structure editing!
The question is not whether it can work, but whether it works better than an edit tool using textual search/replace blocks. I'm curious what you see as the advantage of this approach? One thing that comes to mind is that having a cursor provides some natural integration with LSP signature help
Yes agentic loop with diagnostic feedback is quite powerful. I'd love to have more controllable structured decode from the big llm providers to skip some sources of needing to loop - something like https://github.com/microsoft/aici
I’d love to see how you’re giving this interface to the LLM
> * Way way way more code in the training set.
Why not convert the training code to AST?
You could, but it is extremely expensive to train an LLM that is competitive on coding evals. So, I was assuming use of a model someone else trained.
Also, if it is only trained on code, it's likely to miss out on all the world knowledge that comes from the rest of the data.
fine tune instead of training from scratch might help.
I think you've hit the nail on the head here.
After being pleasantly surprised at how well an AI did at a task I asked of it a few months ago that I thought was much more complicated, I was amused at how badly it did when I asked it to refactor some code to change variable names in one single source file to match a particular coding standard. After doing the work that a good junior developer might have needed a couple of days for, it failed hard at refactoring, working more at the level of a high school freshman.
Structured output generally gives a nice performance boost, so I agree.
Specifically, I'd love to see widespread structured output support for context free grammars. You get a few here and there - vLLM for example. Most LLMs as a service only support JSON output which is better than nothing but doesn't cover this case at all.
Something with semantic analysis - scope informed output, would be a cherry on the top, but while technically possible, I don't see arriving anytime soon. But hey - maybe an opportunity for product differentiation.
Yeah see my other comment above, I've done it with arbitrary grammars, works quite well, don't know why this isn't more widespread
AST is only half of the picture. Semantics (aka the action taken by the abstract machine) are what’s important. What code helps with is identifying patterns which helps in code generation (defmacro and api services generations) because it’s the primary interface. AST is implementation detail.
If you look API exposed by LSP you would understand why. It's very hard to use LSP outside an editor because a lot of it is "where is a symbol in file X on line Y between these two columns is used"
You're looking for Serena: https://github.com/oraios/serena
There's a few agents that integrate with LSP servers
opencode comes to mind off the top of my head
it still tends to do a lot of grep and sed though.