>Yes, you get the same predicted token every time for a given context. But why that token and not a different one? Too many factors to reliably abstract.
Fixed input-to-output mapping is determinism. Prompt instability is not determinism by any definition of this word. Too many people confuse the two for some reason. Also, determinism is a pretty niche thing that is only necessary for reproducibility, and prompt instability/unpredictability is irrelevant for practical usage, for the same reason as in humans - if the model or human misunderstands the input, you keep correcting the result until it's right by your criteria. You never need to reroll the result, so you never see the stochastic side of the LLMs.
But there is no fixed input-to-output mapping in current popuular LLMs.
You mean "corporate inference infrastructure", not LLMs. The reason for different outputs at t=0 is mostly batching optimization. LLMs themselves are indifferent to that, you can run them in a deterministic manner any time if you don't care about optimal batching and lowest possible inference cost. And even then, e.g. Gemini Flash is deterministic in practice even with batching, although DeepMind doesn't strictly guarantee it.
This is all currently irrelevant, making it work well is a much bigger problem. As soon as there's paying demand for reproducibility, solutions will appear. This is a matter of business need, not a technical issue.