Hacker News

Yes, LLMs fundamentally operate as a lossy compression scheme for their training data. There's been countless examples of them reproducing their training data with very high accuracy

People claim that the data isn't stored, but clearly a representation of it is encoded and reproducible. I saw chatgpt word for word plagiarise a stack overflow comment just two days ago