Yes, LLMs fundamentally operate as a lossy compression scheme for their training data. There's been countless examples of them reproducing their training data with very high accuracy

People claim that the data isn't stored, but clearly a representation of it is encoded and reproducible. I saw chatgpt word for word plagiarise a stack overflow comment just two days ago

Does this actually imply a representation of it has been stored or simply that the model is sort of over-fit?

Is there a difference?