Hacker News

The author doesn't explain (or is ignorant about) why this happens. These are special tokens that the model is trained on, and are part of its vocab. For example, here are the <think> and </think> tokens defined in the [Qwen3 tokenizer config](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507/bl...).

The model runtime recognizes these as special tokens. It can be configured using a chat template to replace these token with something else. This is how one provider is modifying the xml namespace, while llama.cpp and vllm would move the content between <think> and </think> tags to a separate field in the response JSON called `reasoning_content`.