I don’t understand why people seem to be attacking the “non-determinism” of LLMs. First, I think most people are confusing “probabilistic” with “non-deterministic” which have very distinct meanings in CS/ML. Non-deterministic typically entails following multiple paths at once. Consider regex matching with NFAs or even the particular view of a list as a monad. The only case where LLMs are “non-deterministic” is when using sampling algorithms like beam search where multiple paths are considered simultaneously. But most LLM usage being discussed doesn’t involve beam search.
But even if one assumes people mean “probabilistic”, that’s also an odd critique given how probabilistic software has pretty much eaten the world. Most of my career has been building reliable product using probabilistic models.
Finally, there’s nothing inherently probabilistic or non-deterministic about LLM generation, these are properties of the sampler applied. I did quite a lot of LLM benchmarking in recent years and almost always used greedy sampling both for performance (doing things like GSM8K strong benefits from choosing the maximum likely path) and reproducibility. You can absolutely set up LLM tools that have perfectly reproducible results. LLMs have many issues but their probabilistic nature is not one of them.
There was an article on hackernews a few years back (before LLMs took over) about jobs that could be replaced by a sign saying "$default_result" 99% of the time.
Like being a cancer diagnostician. Or an inspector at a border crossing.
Using LLMs is currently a lot like going to a diagnostian that always responds "no, you're healthy". The answer is probably right. But still we pay people a lot to get that last 1%.
> But still we pay people a lot to get that last 1%.
If people paid for docs as an independent product, or had the foresight to evaluate the quality of the docs before making a purchase and use it as part of their criteria (or are able to do that at all), I think attitudes around docs and "docs bots" and their correctness, utility etc. would be a lot different.
I think you're being overly (and incorrectly) pedantic about the meaning of "non-deterministic" -- you're applying the fairly niche definition of the term as used on finite automata, when the people you're refuting are using it in the sense of https://en.wikipedia.org/wiki/Nondeterministic_algorithm: "In computer science and computer programming, a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm." I think this usage of the term is more common than the finite automata sense. Dictionary.com doesn't have non-deterministic, but its (relevant) definition of deterministic is "of or relating to a process or model in which the output is determined solely by the input and initial conditions, thereby always returning the same results": https://www.dictionary.com/browse/deterministic
Under that definition of (non-)deterministic, ironically, an NFA is deterministic, because it always produces the same result for the same input.
I'm not being pedantic, this is foundational comp-sci stuff drawing from the theory of computation (it's what the 'N' stands for in NP complete). That is not a particularly great (or relevant) wikipedia article you link to (you can look at the citations). The one on "Non-deterministic programming"[0] is probably better. But ultimately you can't just dismiss NFAs as these serve as the foundation for computational non-determinism. Automata theory isn't just some niche area of computing it's part of how we actually define what computation is.
We can just go straight to the Sipser (from the chapter 1, all emphasis is Sipser's)[1]:
> Nondeterminism is a useful concept that has had great impact on the theory of computation. So far in our discussion, every step of a computation follows in a unique way from the preceding step. When the machine is in a given state and reads the next input symbol, we know what the next state will be--it is determined. We call this deterministic computation. In a nondeterministic machine several choices may exist for the next state at any point.
> How does an NFA compute? Suppose that we are running an NFA on an input string and come to a state with multiple ways to proceed. For example, say that we are in state q_1 in NFA N_1 and that the next input symbol is a 1. After reading that symbol, the machine splits into multiple copies of itself and follows all the possibilities in parallel. Each copy of the machine proceeds and continues as before.
This is why the list monad also provides a useful way to explore non-determinism that mirrors in functional programming terms what NFAs do in a classical theory of computation framework.
To this point, LLMs can form this type of nondeterministic computing when they follow multiple paths at once doing beam search, but are unquestionably deterministic when doing greedy optimization, and still deterministic when using other single path sampling techniques and a known seed.
[0]. https://en.wikipedia.org/wiki/Nondeterministic_programming
[1]. https://cs.brown.edu/courses/csci1810/fall-2023/resources/ch...
You're just explaining what the word "nondeterministic" means when you put it before "finite automaton", which doesn't have much to do with what "nondeterministic" means in other places.
Nondeterminism before "finite automaton" is how the concept is developed in the theory of computation, which later develops into nondeterminism before "turning machine", and eventually before "polynomial time".
You seem to be misunderstanding the role automata theory plays in the larger framework of the theory of computation. It is not a "special case" of nondeterminism, it is the foundation for how all of the theory of computation is built.
Additionally, I'm also demonstrating how that exact same concept plays out in the other framework of computation, functional programming, and it works fundamentally the same way.
I have to say it's a bit surprising to need to defend the fundamental principles of computer science on HN. The topic is "how LLMs compute things" so using the computational definition of nondeterminism seem entirely relevant.
"chaotic system" might be more precise here: small variations in the input may result in arbitrary large differences in the output.
It's not entirely unrelated, the fact that the system is non-deterministic means that it necessarily is probabilistic.
A business can reduce temperature to 0 and choose a specific seed, and it's the correct approach in most cases, but still the answers might change!
On the other hand, it's true that there is some probability that is independent of determinism, for example maybe changing the order of some words might yield different answers, this might be a deterministic machines, but there's millions of ways to frame a question, if the answer depends on trivial details of the question formatting, there's a randomness there. Similar to how there is randomness in who will win a chess match between two equally rated players, despite the game being deterministic.
> the system is non-deterministic means that it necessarily is probabilistic.
This is not correct. Both of the examples I gave where specifically chosen because they use non-determinism without any probabilistic framework associated.
Regex matching using non-deterministic finite automata requires absolutely zero usage of probability. You simply need to keep track of multiple paths and store whether or not any are in valid state at the end of processing the string. The list monad as non-determinism is an even more generic model of non-determinism, that again, requires nothing probabilistic in it's reasoning.
Non-deterministic things do often become probabilistic because typically you have to make a choice of paths, and that choice can have a probabilistic nature. But again, NFA regex matching is a perfect example where no "choice" is needed.
How many paths are there in that regex, say n. If n1 of those paths lead to one result, and n2 of those paths lead to another result, then you can model the probability of the first path being n1/n. There you go probability.
I do agree that there is no designed probability in that example though, but there is emergent probability.