Hacker News

My own guess is that something like this happened:

Claude in testing would interrupt too much to ask for clarifying questions. So as a heavy handed fix they turn down the sampling probability of <end of turn> token which hands back to the user for clarifications.

So it doesn't hand back to the user, but the internal layers expected an end of turn, so you get this weird sort of self answering behaviour as a result.

As an aside my big reason for believing this, is that this sort of dumb simple patch laid onto of a existing behaviour is often the kind of solution optimizers find. Like if you made a dataset with lots of pairs Where one side has lots of <end of turns> and one side does not. The harder thing to learn tends to be to "ask fewer questions and work more autonomously" while the easy thing to learn "less end of turn tokens" tends to get learned way faster.