That's true, but it does mean that the LLM itself actually does have access to those thinking traces and could therefore, at least in principle, answer what it was thinking. They're probably not trained to do that, though.
That's true, but it does mean that the LLM itself actually does have access to those thinking traces and could therefore, at least in principle, answer what it was thinking. They're probably not trained to do that, though.
It depends. Up until recently the models were trained only to "think" on the last user message. So you'd send the message1, got back reply1 w/ think1 but you'd make the next iteration m1 - r1 - m2, and would get back reply2 w/ think2. You would not add the thinking1. That's how the models were trained, and that's how you were supposed to construct the conversation.
Now recently some things have changed, and you can add the thinking part (you get that encrypted from the closed API labs). But the model needs to have been trained for this to work. And doing it this way you'll burn through tokens faster, as the thinking parts are usually rather long.