Hacker News

But most marginal training of Anthropic, OpenAI and Google models is done on LLM paraphrased user data on those platforms. That user data is proprietary and obviously way more valuable than random books.