It actually happens more with these large overparameterized models, because they have the capacity to memorize more than smaller models.
It actually happens more with these large overparameterized models, because they have the capacity to memorize more than smaller models.