Hacker News

matthewbauer 6 hours ago [ - ]

It seems like they should be able to “overweight” newer training data. But the risk is the newer training data is going to skew more towards AI slop than older training data.

otabdeveloper4 6 hours ago [ - ]

There won't ever be newer training data.

The OG data came from sites like Stackoverflow. These sites will stop existing once LLMs become better and easier to use. Game over.

esclerofilo 6 hours ago [ - ]

Every time claude code runs tests or builds after a change, it's collecting training data.

co_king_5 6 hours ago [ - ]

Has Anthropic been able to leverage this training data successfully?

esclerofilo 6 hours ago [ - ]

I can't pretend to know how things work internally, but I would expect it to be involved in model updates.

otabdeveloper4 6 hours ago [ - ]

You need human language programming-related questions to train on too, not just the code.

8note 6 hours ago [ - ]

thats what the related chats are for?