It actually happens more with these large overparameterized models, because they have the capacity to memorize more than smaller models.