Hacker News

Sometimes, the reason for non-determinism is implementation-specific. For instance, in GPT-2's source code (I haven't checked other model versions), setting the temperature in the GUI does not lead to a value of 0 but "epsilon" (a very small value larger than 0), to avoid a division by zero error in the code, which makes sense.

For many applications, non-determinism implies "useless". This has been a long standing issue with LDA topic models. In particular in the legal, financial and regulatory domains, if a method is not deterministic, it may be illegal to use it or it may lead to follow-on requirements that one does not want (e.g. all screens shown to humans must be preserved to be able to go back and reconstruct what exactly happened to a particular user in a particular second).