Humans often answer with fluff like "That's a good question, thanks for asking that, [fluff, fluff, fluff]" to give themselves more breathing room until the first 'token' of their real answer. I wonder if any LLM are doing stuff like that for latency hiding?
I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off.
Do humans really do that often?
Coming up with all that fluff would keep my brain busy, meaning there's actually no additional breathing room for thinking about an answer.
People who professionally answer questions do that, yes. Eg politicians or press secretaries for companies, or even just your professor taking questions after a talk.
> Coming up with all that fluff would keep my brain busy, meaning there's actually no additional breathing room for thinking about an answer.
It gets a lot easier with practice: your brain caches a few of the typical fluff routines.