Hacker News

tptacek 4 days ago [ - ]

No? These kinds of analyses all seem to rely on the notion that the LLM-caller needs to accept whatever output the LLM provides. In practice, they discard all the outputs that don't compile, and then a further subset of the ones that don't --- those outputs that aren't instantly clear to the caller.

My intuition for the problem here is that people are fixated on the nondeterminism of the LLM itself, which is of limited importance to the actual problem domain of code generation. The LLM might spit out ancient Egyptian hieroglyphics! It's true! The LLM is completely nondeterministic. But nothing like that is ever going to get merged into `main`.

It's fine if you want to go on about how bad "vibe coding" is, with LLM-callers that don't bother to read LLM output, because they're not competent. But here we're assuming an otherwise competent developer. You can say the vibe coder is the more important phenomenon, but the viber doesn't implicate the halting problem.

skydhash 4 days ago [ - ]

Valid programs are almost infinite. Context free grammars (which describe valid programs) are generative. When you're programming, you are mostly restricting the set of valid program to include only the few that satisfy the specs. Adding an extra 0 to a number is valid, but put that in the context of money transactions, it's a "hell breaks loose" situation.

SO that's why "it compiles" is worthless in a business settings. Of course it should compile. That's the bare minimum of expectations. And even "it passes the tests" is not that great. That just means you have not mess things up. So review and quality (accountability for both) is paramount, so that the proper stuff get shipped (and fixed swiftly if there was a mistake).

ants_everywhere 4 days ago [ - ]

I have LLMs generate Haskell. Having the code compile means everything type checks and is not worthless. That's a huge constraint on valid programs.

skydhash 4 days ago [ - ]

> Having the code compile means everything type checks and is not worthless. That's a huge constraint on valid programs.

In a business settings, what usually matter is getting something into prod and not have bug reports thrown back. And a maintainable code that is not riddled down with debts.

Compiled code is as basic as steering left and right for a F1 driver. Having the tests pass is like driving the car at slow speed and completing a lap with no other cars around. If you're struggling to do that, then you're still in the beginner phase and not a professional. The real deal is getting a change request from Product and and getting it to Production.

tptacek 4 days ago [ - ]

It feels like you stopped reading before "and then a further subset of those".

Again: my claim is simply that whatever else is going on, the halting problem doesn't enter into it, because the user in this scenario isn't obligated to prove arbitrary programs. Here, I can solve the halting problem right now: "only accept branchless programs with finite numbers of instructions". Where's my Field Medal? :)

It always feels like the "LLMs are nondeterministic" people are relying on the claim that it's impossible to tell whether an arbitrary program is branchless and finite. Obviously, no, that's not true.

skydhash 4 days ago [ - ]

> It feels like you stopped reading before "and then a further subset of those".

Pretty sure you've just edited to add that part.

tptacek 4 days ago [ - ]

No, I'd have indicated that in my comment if I had. Sorry, I think you just missed it.

I did add the last paragraph of the comment you just responded to (the one immediately above this) about 5 seconds after I submitted it, though. Doesn't change the thread.