Fable/mythos are the first models from anthropic that hide 100% of reasoning tokens. So it seems to me like we're about to get a lot more data about to what extent Chinese model progress has been a consequence of distillation techniques.
This isn't correct, Claude hasn't displayed the raw chain of thought for any of the Claude 4 series models, which were released in May 2025. Sonnet 4.6/Opus 4.8 only display a summarized chain of thought, which is produced by a secondary model. Fable displays its summarized chain of thought in the same manner.
The thinking traces disappeared because Anthropic changed them to be hidden by default. The rationale for hiding it was that most people don't look at the thinking traces https://news.ycombinator.com/item?id=47664442 . You can reenable thinking traces in Claude code settings with the flag showThinkingSummaries: true.
Qualified it with "100%" because claude4 models show the first few lines of the chain of thought:
>On Claude 4 models, the first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes. Claude Mythos Preview summarizes from the first token, so its thinking blocks do not show this verbose preamble.
https://platform.claude.com/docs/en/build-with-claude/extend...
Hidden by default should make it easier to identify who’s scraping even those summarized CoTs, I figure. Also figure it’s very expensive to summarize every single chat on the service.
DeepSeek took everyone by surprise with R1. I'm pretty sure they and/or others will do the unexpected again. Not as if the US has a monopoly on awesome talent.
A quick look at the author names on most recent big AI papers, says that's an understatement. US's comparative advantage is in data centers, not in expertise.
Of course, they're betting they won't need those experts soon.
US companies used to release lots of papers. You can go back to when they did, and see for yourself how many Chinese names there are on them. Or you can look at those which still publish a lot, like Nvidia.
Google is catching up, especially by the metric of "making money".
Meta from the start clearly had a strategy not of competing to dominate, but preventing their competitors from dominating by releasing open models and software. (You have them to thank you're not working in Tensorflow right now)
How so? The fact that China is going to launch a comparable model soon is the whole point of what’s happening now. Everyone knows there are going to be open models soon that have the same capabilities - Anthropic has literally said so. The restrictions now are to buy time to patch the security issues before those Chinese models are made available to the whole world.
oh, do you not pay attention to the hardware they're allowed to buy from nvidia? At this point, it's more just being nerfed than being able to do the magical training stuff.
Fable/mythos are the first models from anthropic that hide 100% of reasoning tokens. So it seems to me like we're about to get a lot more data about to what extent Chinese model progress has been a consequence of distillation techniques.
This isn't correct, Claude hasn't displayed the raw chain of thought for any of the Claude 4 series models, which were released in May 2025. Sonnet 4.6/Opus 4.8 only display a summarized chain of thought, which is produced by a secondary model. Fable displays its summarized chain of thought in the same manner.
The thinking traces disappeared because Anthropic changed them to be hidden by default. The rationale for hiding it was that most people don't look at the thinking traces https://news.ycombinator.com/item?id=47664442 . You can reenable thinking traces in Claude code settings with the flag showThinkingSummaries: true.
Qualified it with "100%" because claude4 models show the first few lines of the chain of thought:
>On Claude 4 models, the first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes. Claude Mythos Preview summarizes from the first token, so its thinking blocks do not show this verbose preamble. https://platform.claude.com/docs/en/build-with-claude/extend...
Hidden by default should make it easier to identify who’s scraping even those summarized CoTs, I figure. Also figure it’s very expensive to summarize every single chat on the service.
Google can't do it, OpenAI didn't do it, Meta didn't do it etc etc.
Why do you think China will?
I am quite certain the gap will only grow
DeepSeek took everyone by surprise with R1. I'm pretty sure they and/or others will do the unexpected again. Not as if the US has a monopoly on awesome talent.
A quick look at the author names on most recent big AI papers, says that's an understatement. US's comparative advantage is in data centers, not in expertise.
Of course, they're betting they won't need those experts soon.
The question you (and parent) are dodging is why doesn't Google and Meta catch up?
US closed labs don't release papers as I'm sure you know
US companies used to release lots of papers. You can go back to when they did, and see for yourself how many Chinese names there are on them. Or you can look at those which still publish a lot, like Nvidia.
Google is catching up, especially by the metric of "making money".
Meta from the start clearly had a strategy not of competing to dominate, but preventing their competitors from dominating by releasing open models and software. (You have them to thank you're not working in Tensorflow right now)
That's a skill issue, not a technology issue.
How so? The fact that China is going to launch a comparable model soon is the whole point of what’s happening now. Everyone knows there are going to be open models soon that have the same capabilities - Anthropic has literally said so. The restrictions now are to buy time to patch the security issues before those Chinese models are made available to the whole world.
It would take less than 1 month if not for the restrictions. One of the reason is they might be using distilling to achieve the parity.
oh, do you not pay attention to the hardware they're allowed to buy from nvidia? At this point, it's more just being nerfed than being able to do the magical training stuff.