Yeah, but if its final performance comes from being trained with data from a bigger model one can question whether it's a way to build genuinely new 40B models.
Yeah, but if its final performance comes from being trained with data from a bigger model one can question whether it's a way to build genuinely new 40B models.