Mostly the engineering part of software is dealing with non-determinism, by avoiding it or enforcing determinism. Take something like TCP, it's all about guaranteeing the determinism that either the message is sent and received or it is not. And we have a lot of algorithms that tries to guarantee consistency of information between the elements of a system.
But there is an underlying deterministic property in the TCP example. A message is either received within a timeout or not.
How can that be extralopated with LLMs? How does a system independently know that it's arrived at a correct answer within a timeout or not? Has the halting problem been solved?
You don't need to solve the halting problem in this situation, because you only need to accept a subset of valid, correct programs.
> How can that be extralopated with LLMs? How does a system independently know that it's arrived at a correct answer within a timeout or not?
That's the catch 22 with LLM. You're supposed to be both the asker and the verifier. Which in practice, it's not that great. LLMs will just find the snippets of code that matches somehow and just act on it (It's the "I'm feeling Lucky" button with extra steps)
In traditional programming, coding is a notation too more than anything. You supposed to have a solution before coding, but because of how the human brain works, it's more like a blackboard, aka an helper for thinking. You write what you think is correct, verify your assumptions, then store and forget about all of it when that's true. Once in a while, you revisit the design and make it more elegant (at least you hope you're allowed to).
LLM programming, when first started, was more about a direct english to finished code translation. Now, hope has scaled down and it's more about precise specs to diff proposal. Which frankly does not improve productivity as you can either have a generator that's faster and more precise (less costly too) or you will need to read the same amount of docs to verify everything as you would need to do to code the stuff in the first place (80% of the time spent coding).
So no determinism with LLMs. The input does not have any formal aspects, and the output is randomly determined. And the domain is very large. It is like trying to find a specific grain of sand on a beach while not fully sure it's there. I suspect most people are doing the equivalent of taking a handful of sand and saying that's what they wanted all along.
No? These kinds of analyses all seem to rely on the notion that the LLM-caller needs to accept whatever output the LLM provides. In practice, they discard all the outputs that don't compile, and then a further subset of the ones that don't --- those outputs that aren't instantly clear to the caller.
My intuition for the problem here is that people are fixated on the nondeterminism of the LLM itself, which is of limited importance to the actual problem domain of code generation. The LLM might spit out ancient Egyptian hieroglyphics! It's true! The LLM is completely nondeterministic. But nothing like that is ever going to get merged into `main`.
It's fine if you want to go on about how bad "vibe coding" is, with LLM-callers that don't bother to read LLM output, because they're not competent. But here we're assuming an otherwise competent developer. You can say the vibe coder is the more important phenomenon, but the viber doesn't implicate the halting problem.
Valid programs are almost infinite. Context free grammars (which describe valid programs) are generative. When you're programming, you are mostly restricting the set of valid program to include only the few that satisfy the specs. Adding an extra 0 to a number is valid, but put that in the context of money transactions, it's a "hell breaks loose" situation.
SO that's why "it compiles" is worthless in a business settings. Of course it should compile. That's the bare minimum of expectations. And even "it passes the tests" is not that great. That just means you have not mess things up. So review and quality (accountability for both) is paramount, so that the proper stuff get shipped (and fixed swiftly if there was a mistake).
I have LLMs generate Haskell. Having the code compile means everything type checks and is not worthless. That's a huge constraint on valid programs.
> Having the code compile means everything type checks and is not worthless. That's a huge constraint on valid programs.
In a business settings, what usually matter is getting something into prod and not have bug reports thrown back. And a maintainable code that is not riddled down with debts.
Compiled code is as basic as steering left and right for a F1 driver. Having the tests pass is like driving the car at slow speed and completing a lap with no other cars around. If you're struggling to do that, then you're still in the beginner phase and not a professional. The real deal is getting a change request from Product and and getting it to Production.
It feels like you stopped reading before "and then a further subset of those".
Again: my claim is simply that whatever else is going on, the halting problem doesn't enter into it, because the user in this scenario isn't obligated to prove arbitrary programs. Here, I can solve the halting problem right now: "only accept branchless programs with finite numbers of instructions". Where's my Field Medal? :)
It always feels like the "LLMs are nondeterministic" people are relying on the claim that it's impossible to tell whether an arbitrary program is branchless and finite. Obviously, no, that's not true.
> It feels like you stopped reading before "and then a further subset of those".
Pretty sure you've just edited to add that part.
No, I'd have indicated that in my comment if I had. Sorry, I think you just missed it.
I did add the last paragraph of the comment you just responded to (the one immediately above this) about 5 seconds after I submitted it, though. Doesn't change the thread.
> LLMs will just find the snippets of code that matches somehow
This suggests a huge gap in your understanding of LLMs if we are to take this literally.
> LLM programming, when first started, was more about a direct english to finished code translation
There is no direct english to finished code translation. A prompt like "write me a todo app" has infinitely many codes it maps to with different tradeoffs and which appeal to different people. Even if LLMs never made any coding mistakes, there is no function that maps a statement like that to specific pieces of code unless you're making completely arbitrary choices like the axiom of choice.
So we're left with the fact that we have to specify what we want. And at that LLMs do exceptionally well.
Right, you handle determinism by applying engineering. E.g. having fail safes, redundancies, creating robust processes, etc.