Conceptually setting temperature to be >0 doesn't actually introduce any non-determinism. If your sampler is seeded then it will always choose the same next token. Higher temperature only flattens the logit distribution.