You don't get access to the thinking traces. Might work with local models tho, but the current <thinking/> meta isn't particularly suited for this either, as it's a big blob of rambling surfaced by RL, with the "only" objective being that the thinking blob somehow leads to a better final answer. Something more detailed, using templates akin to oAI's harmony could work, provided there's also a step that teaches the models to reflect on the various thinking channels, and maybe surface bits and pieces to include in "skills" or "learnings".

That's true, but it does mean that the LLM itself actually does have access to those thinking traces and could therefore, at least in principle, answer what it was thinking. They're probably not trained to do that, though.

It depends. Up until recently the models were trained only to "think" on the last user message. So you'd send the message1, got back reply1 w/ think1 but you'd make the next iteration m1 - r1 - m2, and would get back reply2 w/ think2. You would not add the thinking1. That's how the models were trained, and that's how you were supposed to construct the conversation.

Now recently some things have changed, and you can add the thinking part (you get that encrypted from the closed API labs). But the model needs to have been trained for this to work. And doing it this way you'll burn through tokens faster, as the thinking parts are usually rather long.