Well, it is a Chinese model, maybe it thinks better in Chinese?

Hànzì can use 30%-40% fewer tokens than English. So, yes, it probably thinks better in Chinese.

There was some funny suggestion online with using Classical Chinese (which has a similar status to Latin in Europe, and it uses at least 50% less characters, probably similar savings with tokens) to reason. Don't know whether the reasoning levels were on par with modern languages, but it was worth a laugh.

If so, would other models like ChatGPT benefit from translating the user's prompt to Chinese/Japanese and thinking in Hanzi/Kanji and then converting the response back to the user's language before displaying it?

Yeah, it’s why the Caveman skill includes a Wenyan mode.

https://github.com/JuliusBrussee/caveman

I believe that most reasoning models actually think in their own "language" which is not really understandable by humans. The thinking traces that are shown in the UI are actually summaries generated by a smaller model in plain english (or user language). Sometimes this leaks through and you see some chinese/japanese characters in e.g. Claude's reasoning.

Wait, this isn't real, is it? Is there actually an intermediate model that translates DeepSeek's thinking from its "alien language" into human languages? That's not actually the case, right?

I thought "thinking" is literally the model generating additional text in a human language that shows its "thought process". It's added to the model's context, which helps it reason better because it now has this self-generated context.

The "their own language" idea seems to come from some recent science fiction where LLMs develop their alien language and take over the world by 2037 or something.

Yeah, it's actually the case. Researchers have shown that the models response doesn't always follow from the reasoning. Whether you consider that an internal language or not really depends on what you're speculating the neural network is doing. I think there was an Antropic paper on it.

You're right, it's just additional text that allows it to do thinking / reasoning-like behavior. The big proprietary models hide the real output from the user and instead provide a friendly abridged version, but that's just to protect their secret sauce from distillation.

The parent is off, you’re right. They may reason in any language, typically whatever the user’s language is, and you’ll see the reasoning directly with an open model like Deepseek.

Research only showed that thinking might be disconnected from the final output but in my experience they are very strongly correlated in recent models

> Research only showed that thinking might be disconnected from the final output

It is trivial to regularly spot obvious contradictions and inconsistencies if you read carefully. For example I've encountered traces that amounted to "I can deduce X, therefore Y, so that means Z" but then the model turns around and outputs "the answer is W because X". It's even been demonstrated that having the model output placeholder tokens or other gibberish instead of "thoughts" still improves performance. However the thinking traces can still be useful to the end user regardless.

Current models simply generate additional text that gets added to the context for the trace. However iterative models that "think" by repeatedly looping through several layers instead of outputting text have recently been demonstrated.

As far as I'm aware, it's not true for models like DeepSeek or other Chinese open-weight models (at least those that I have seen); their reasoning traces are fully composed from some human language, be it English, Chinese or another one; by the way, most of them can adapt their reasoning based on user language, for example, if user speaks English the reasoning more likely will be in English.

I think that for DeepSeek problem (thinking and replying in Chinese) everything is kinda simpler: in their official chat, they're probably using some kind of system prompt which is (probably) written in Chinese, so that's why model may prefer Chinese in it's output.

I have seen mixed language thinking from claude when i speak to it in english but we are discussing a product thats in spanish or searching amazon spain.

Summaries by different smaller models are usually made by closed proprietary models like Claude as a way to combat the distillation of real reasoning traces by competitors. Open weight models show the real reasoning traces. Reasoning traces operate in the same space as the non-reasoning output. It's all just one large text for an LLM. Internally, reasoning is just ordinary chat completion between <think></think> tags.

This is inaccurate. The displayed reasoning traces are summaries, but the model thinks in nominally regular human languages. AI labs are very light on details (as they consider them as their "edge"), but both GPT5.5 and Claude Mythos/Fable system cards discuss chain-of-thought monitorability quite a bit.

They occasionally show snippets of CoT in papers they write, e.g. for o3/o4/GPT5 models [1] or Claude 3.5 Haiku [2].

[1]: https://openai.com/index/evaluating-chain-of-thought-monitor... [2]: https://transformer-circuits.pub/2025/attribution-graphs/bio...

> summaries generated

Or hallucinated

[deleted]

There are other even more efficient ways of doing this, i.e. using images instead of raw text https://xcancel.com/karpathy/status/1980397031542989305?lang...

But why does it do so inconsistently, and sometimes even forgetting to swap back to English when it comes time to do 'normal' output? It also seems recent, as when I was using deepseek even a week ago this was very rare compared to what I was seeing yesterday. I had to start including a line asking it to stay to English because I can only speak/read English.

A chinese model which tells me it is Claude from Anthropic? Not really. Chinese HW yes, SW not.

I've seen that people can get Claude and friends to say they're DeepSeek if they ask in Chinese. I think distillation is happening all the time.

Google Chrome tells me it's like 14 different things. How is that any different then DeepSeek saying it is Claude?

I guess Claude isn’t an American model either considering how Anthropic has fed basically all of the globe into it.