Hacker News

fny 3 days ago [ - ]

Strings are a universal interface with no dependencies. You can do anything in any language across any number of files. Any other abstraction heavily restricts what you can accomplish.

Also, LLMs aren't trained on ASTs, they're trained on strings -- just like programmers.

skybrian 3 days ago [ - ]

No, it’s not really “any string.” Most strings sent to an interpreter will result in a syntax error. Many Unix commands will report an error if you pass in an unknown flag.

In theory, there is a type that describes what will parse, but it’s implicit.

anon7000 3 days ago [ - ]

Exactly. LLMs are trained on huge amounts of bash scripts. They “know” how to use grep/awk/whatever. ASTs are, I assume, not really part of that training data. How would they know how to work well with on? LLMs are trained on what humans do to code. Yes, I assume down the road someone will train more efficient versions that can work more closely with the machine. But LLMs work as well as they do because they have a large body of “sed” statements in their statistical models

theshrike79 2 days ago [ - ]

They also know how to use modern options like fd and rg, which allow more complex operations with a single call.

FuckButtons 3 days ago [ - ]

treesitter is more or less a universal AST parser you can run queries against. Writing queries against an AST that you incrementally rebuild is massively more powerful and precise in generating the correct context than manually writing infinitely many shell pipeline oneliners and correctly handling all of the edge cases.

anon7000 3 days ago [ - ]

I agree with you, but the question is more whether existing LLMs have enough training with AST queries to be more effective with that approach. It’s not like LLMs were designed to be precise in the first place

3 days ago [ - ]

[deleted]

UltraSane a day ago [ - ]

generating code that doesn't run is just a waste of electricity.