Operating a car (i.e. driving) is certainly not deterministic. Even if you take the same route over and over, you never know exactly what other drivers or pedestrians are going to do, or whether there will be unexpected road conditions, construction, inclement weather, etc. But through experience, you build up intuition and rules of thumb that allow you to drive safely, even in the face of uncertainty.
It's the same programming with LLMs. Through experience, you build up intuition and rules of thumb that allow you to get good results, even if you don't get exactly the same result every time.
> It's the same programming with LLMs. Through experience, you build up intuition and rules of thumb that allow you to get good results, even if you don't get exactly the same result every time.
Friend, you have literally described a nondeterministic system. LLM output is nondeterministic. Identical input conditions result in variable output conditions. Even if those variable output conditions cluster around similar ideas or methods, they are not identical.
The problem is that this is completely false. LLMs are actually deterministic. There are a lot more input parameters than just the prompt. If you're using a piece of shit corpo cloud model, you're locked out of managing your inputs because of UX or whatever.
Ah, we've hit the rock bottom of arguments: there's some unspecified ideal LLM model that is 100% deterministic that will definitely 100% do the same thing every time.
We've hit rock bottom of rebuttals, where not only is domain knowledge completely vacant, but you can't even be bothered to read and comprehend what you're replying to. There is no non-deterministic LLM. Period. You're already starting off from an incoherent position.
Now, if you'd like to stop acting like a smug ass and be inquisitive as per the commenting guidelines, I'd be happy to tell you more. But really, if you actually comprehended the post you're replying to, there would be no need since it contains the piece of the puzzle you aren't quite grasping.
> There is no non-deterministic LLM.
Strange then that the vast majority of LLMs that people use produce non-deterministic output.
Funnily enough I had literally the same argument with someone a few months back in a friends group. I ran the "non-shitty non-corpo completely determenistic model" through ollama... And immediately got two different answers for the same input.
> Now, if you'd like to stop acting like a smug ass and be inquisitive as per the commenting guidelines,
Ah. Commenting guidelines. The ones that tell you not to post vague allusions to something, not to be dismissive of what others are saying, responding to the strongest plausible interpretation of someone says etc.? Those ones?
> Strange then that the vast majority of LLMs that people use produce non-deterministic output.
> I ran the "non-shitty non-corpo completely determenistic model" through ollama... And immediately got two different answers for the same input.
With deterministic hardware in the same configuration, using the same binaries, providing the same seed, the same input sequence to the same model weights will produce bit-identical outputs. Where you can get into trouble is if you aren't actually specifying your seed, or with non-deterministic hardware in varying configurations, or if your OS mixes entropy with the standard pRNG mechanisms.
Inference is otherwise fundamentally deterministic. In implementation, certain things like thread-scheduling and floating-point math can be contingent on the entire machine state as an input itself. Since replicating that input can be very hard on some systems, you can effectively get rid of it like so:
A note that "--temperature 0" may not strictly be necessary. Depending on your system, setting the seed and restricting to a single thread will be sufficient.These flags don't magically change LLM formalisms. You can read more about how floating point operations produce non-determinism here:
https://arxiv.org/abs/2511.17826
In this context, forcing single-threading bypasses FP-hardware's non-associativity issues that crop up with multi-threaded reduction. If you still don't have bit-replicated outputs for the same input sequence, either something is seriously wrong with your computer or you should get in touch with a reputable metatheoretician because you've just discovered something very significant.
> Those ones?
Yes those ones. Perhaps in the future you can learn from this experience and start with a post like the first part of this, rather than a condescending non-sequitur, and you'll find it's a more constructive way to engage with others. That's why the guidelines exist, after all.
> These flags don't magically change LLM formalisms. You can read more about how floating point operations produce non-determinism here:
Basically what you're saying is "for 99.9% of use cases and how people use them they are non-deterministic, and you have to very carefully work around that non-determinism to the point of having workarounds for your GPU and making them even more unusable"
> In this context, forcing single-threading bypasses FP-hardware's non-associativity issues that crop up with multi-threaded reduction.
Translation: yup, they are non-deterministic under normal conditions. Which the paper explicitly states:
--- start quote ---
existing LLM serving frameworks exhibit non-deterministic behavior: identical inputs can yield different outputs when system configurations (e.g., tensor parallel (TP) size, batch size) vary, even under greedy decoding. This arises from the non-associativity of floating-point arithmetic and inconsistent reduction orders across GPUs.
--- end quote ---
> If you still don't have bit-replicated outputs for the same input sequence, either something is seriously wrong with your computer or you should get in touch with a reputable metatheoretician because you've just discovered something very significant.
Basically what you're saying is: If you do all of the following, then the output will be deterministic:
- workaround for GPUs with num_thread 1
- temperature set to 0
- top_k to 0
- top_p to 0
- context window to 0 (or always do a single run from a new session)
Then the output will be the same all the time. Otherwise even "non-shitty corp runners" or whatever will keep giving different answers for the same question: https://gist.github.com/dmitriid/5eb0848c6b274bd8c5eb12e6633...
Edit. So what we should be saying is that "LLM models as they are normally used are very/completely non-deterministic".
> Perhaps in the future you can learn from this experience and start with a post like the first part of this
So why didn't you?
> The problem is that this is completely false. LLMs are actually deterministic. There are a lot more input parameters than just the prompt. If you're using a piece of shit corpo cloud model, you're locked out of managing your inputs because of UX or whatever.
When you decide to make up your own definition of determinism, you can win any argument. Good job.
Yes, that's my point. Neither driving nor coding with an LLM is perfectly deterministic. You have to learn to deal with different things happening if you want do do either successfully.
> Neither driving nor coding with an LLM is perfectly deterministic.
Funny.
When driving, I can safely assume that when I turn the steering wheel in the direction in turns. That the road that was there yesterday is there today (barring certain emergencies, that's why they are emergencies). That the red light in a traffic light means stop, and the green means go.
And not the equivalent "oh, you're completely right, I forgot to include the wheels, wired the steering wheel incorrectly, and completely messed up the colors"
> Operating a car (i.e. driving) is certainly not deterministic.
Yes. Operating a car or a table saw is deterministic. If you turn your steering wheel left, the car will turn left every time with very few exceptions that can also be explained deterministically (e.g. hardware fault or ice on road).
Operating LLMs is completly non-deterministic.
> Operating LLMs is completly non-deterministic.
Claiming "completely" is mapping a boolean to a float.
If you tell an LLM (with tools) to do a web search, it usually does a web search. The biggest issue right now is more at the scale of: if you tell it to create turn-by-turn directions to navigate across a city, it might create a python script that does this perfectly with OpenStreetMap data, or it may attempt to use its own intuition and get lost in a cul-de-sac.
Wow. It can do a web search. And that is useful in the context of programming how? Or in any context?
The question is about the result of an action. Given the same problem statement in the same codebase it will produce wildly different results even if prompted two times in a row.
Even for trivial tasks the output may vary between just a simple fix, and a rewrite of half of the codebase. You can never predict or replicate the output.
To quote Douglas Adams, "The ships hung in the sky in much the same way that bricks don't". Cars and table saws operate in much the same way that LLMs don't.
> Wow. It can do a web search. And that is useful in the context of programming how? Or in any context?
Your own example was turning a steering wheel.
A web search is as relevant to the broader problems LLMs are good at, as steering wheels are to cars.
> Given the same problem statement in the same codebase it will produce wildly different results even if prompted two times in a row.
Do you always drive the same route, every day, without alteration?
Does it matter?
> You can never predict or replicate the output.
Sure you can. It's just less like predicting what a calculator will show and more like predicting if, when playing catch, the other player will catch your throw.
You can learn how to deal with reality even when randomness is present, and in fact this is something we're better at than the machines.
> Your own example was turning a steering wheel.
The original example was trying to compare LLMs to cars and table saws.
> Do you always drive the same route, every day, without alteration?
I'm not the one comparing operating machinery (cars, table saws) to LLMs. Again. If I turn a steering wheel in a car, the car turns. If input the same prompt into an LLM, it will produce different results at different times.
Lol. Even "driving a route" is probably 99% deterministic unlike LLMs. If I follow a sign saying "turn left", I will not end up in a "You are absolutely right, there shouldn't be a cliff at this location" situation.
Edit: and when signs end pointing to a cliff, or when a child runs onto the roads in front of you, these are called emergency situations. Whereas emergency situations are the only available modus operandi for an LLM, and actually following instructions is a lucky happenstance.
> It's just less like predicting what a calculator will show and more like predicting if, when playing catch, the other player will catch your throw
If you think that throwing more and more bad comparisons that don't work into the conversation somehow proves your point, let me dissuade you of that notion: it doesn't.