His solution still relies on greedy (temperature 0) sampling, which is probably not optimal for model performance on various tasks. For example, Gemini 2.5 uses temperature 1 by default. But deterministic inference with temperature >0 can still be achieved by using pseudorandom sampling with a fixed seed.

Conceptually setting temperature to be >0 doesn't actually introduce any non-determinism. If your sampler is seeded then it will always choose the same next token. Higher temperature only flattens the logit distribution.

The point of the blog is that even at "supposed" deterministic generative sampling, non-determinism creeps in. This in turn has disastrous effects in very real experiments.

My point is that greedy sampling is not just not sufficient but also not necessary for deterministic inference.