Hacker News

Even with all training data provided, won't it still be a black box? Unless one trains it exactly the same, in the exact same order for each piece of data, potentially requiring the exact same hardware with specific optimizations disabled due to race conditions, etc., the final weights will be different, and so knowing if the original weights actually contain anything extra still leaves any released weights as a black box, no? There isn't an equivalent of reproducible builds for LLM weights, even if all of this was provided, right?