I think it's the default behavior, because it's cheaper and faster to produce than the real answer.

I assume the beginning of the answer is given to a cheaper, faster model, so that the slower, more expensive one can have time to think.

It keeps the conversation lively and natural for most people.

Would be interesting to test if it's true, by disabling it with a system prompt, and measure if the time-to-answer is slower for the first word.