If the architecture is the same, and the training scripts/data is the same, but the training yielded slightly different weights (but still same model architecture), is it a new model or just a iteration on the same model?
What if it isn't even a re-training from scratch but a fine-tune of an existing model/weights release, is it a new version then? Would be more like a iteration, or even a fork I suppose.
If the architecture is the same, and the training scripts/data is the same, but the training yielded slightly different weights (but still same model architecture), is it a new model or just a iteration on the same model?
What if it isn't even a re-training from scratch but a fine-tune of an existing model/weights release, is it a new version then? Would be more like a iteration, or even a fork I suppose.
Yes, it's a new model, but not a Claude 4.
It's the same, but a bit different; Claude 3.6 makes sense to me.
I would assume that 3.5 means that the base training (which takes weeks/month) wasn't changed but only finetuning happened.
Could be just additional post-training (aka finetuning) for coding/etc.