I can't find any info on what exactly is open sourced.

And in any case what does open source actually mean for an llm? It's not like you can look inside it to see what it's doing.

The model is not "open source", but it is an open weights model.

You can download it from the link given here at the top and you can run it on your own hardware, with whichever open-source harness you prefer, without having to worry about token cost or about subscription limits or about any future degradation in performance that you cannot control.

The recent history has demonstrated that such risks are very significant.

Being open weights is important for anyone who wants to use an LLM. Being open source is important only for a subset of those, who have the will, the knowledge and the means to train a model from its training data.

Having access to the training data used by a model would be very nice, but the reality is that for a normal LLM user it is very beneficial to use an open-weights model with an open-source harness, but it would be much harder to exploit the advantage of having access to all the information about how the LLM has been created.

For me open source means that the entire training data is open sourced as well as the code used for training it otherwise it's open weight. You can run it where you like but it's a black box. Nomic's models are good example of opensource.

Even with all training data provided, won't it still be a black box? Unless one trains it exactly the same, in the exact same order for each piece of data, potentially requiring the exact same hardware with specific optimizations disabled due to race conditions, etc., the final weights will be different, and so knowing if the original weights actually contain anything extra still leaves any released weights as a black box, no? There isn't an equivalent of reproducible builds for LLM weights, even if all of this was provided, right?

Yes the weights are basically compiled code, compiled from the source data and the training code.

Look up Olmo 3, where the have open weights, checkpoints, training data, and training process.

AllenAi is the fullest open ai I know of