It’s just doing exactly what it’s designed to do. Generate text that’s consistent with its prompts.

People often seem to get confused by all the anthropomorphizing that’s done about these models. The text it outputs that’s called “thinking” is not thinking, it’s text that’s output in response to system prompts, just like any other text generated by a model.

That text can help the model reach a better result because it becomes part of the prompt, giving it more to go on, essentially.

In that sense, it’s a bit like a human thinking aloud, but crucially it’s not based on the model’s “experience” as your example shows, it’s based on what the model statistically predicts a human might say under those circumstances.