You’d think that, but it sees like big business and governments are treating inference as somehow special. I dunno, maybe low temperatures can highlight this weird situation?
Temperature is an easy knob to twist, after all. Somebody (not me I’m too poor to pay the lawyers) should do a search and find where the crime starts.
What does temperature have to do with anything?
Or however deterministic inference is supposed to happen. I don’t know LLMs.
https://152334h.github.io/blog/non-determinism-in-gpt-4/
Well, it's still not deterministic even at temp 0. The tech described in my comment's parent is speculative, and technically it's not even inference, once it's perfectly reproducible.
At that point it's retrieving results from a database.
EDIT: how would OP address my main point, which is that det. inference is functionally equivalent to any arbitrary keyed data storage/retrieval system?
> The tech described in my comment's parent is speculative, and technically it's not even inference, once it's perfectly reproducible.
This is not true. Fabrice Bellard's ts_zip [0] and ts_sms [1] uses a LLM to compress text. It beats stuff like .xz etc but of course is much slower. Now.. if it were non-deterministic, you would have trouble decompressing exactly into what it compressed. So, it uses a deterministic LLM
[0] https://bellard.org/ts_zip/ https://news.ycombinator.com/item?id=37152978
[1] https://bellard.org/ts_sms/ https://lobste.rs/s/5srkwz/fabrice_bellard_s_ts_sms_short_me... (funny enough many people comment that, if it uses LLM, it must be lossy. This is not the case. It's compared to xz in the page because it's lossless)
Deterministic inference isn't speculative, it's achievable if you want it. It's just not the default.
It seems the TL:DR is race conditions, rounding and other RTE specific factors.