Nothing happened. Judge by the outputs, if they are infringing, the model learned expression. If not, it learned abstractions and styles. Why do you think there were only 2-3 lawsuits focusing on output infringement? If regurgitation was a big issue copyright holders would raise hell. But they focus on training data instead, which means the outputs are ok.

Anyway, focusing on training data is misguided unless you also restrict in context learning by policing what users can paste into the model.