Practically, the performance loss of making it truly repeatable (which takes parallelism reduction or coordination overhead, not just temperature and randomizer control) is unacceptable to most people.

It's also just not very useful. Why would you re-run the exact same inference a second time? This isn't like a compiler where you treat the input as the fundamental source of truth, and want identical output in order to ensure there's no tampering.