> I am saying this probably is "silly behavior by a government" and it is a milestone that points towards what the future may look like. Why can't it be both?
Here is why it's unlikely this is anything other than "silly behavior by a government":
- some benchmarks show GPT-5.5, Gemini 3.1, and even Claude Opus outperforming Claude Fable, and yet it's Fable which is restricted.
- some benchmarks still show the likes of Kimi 2.5 outperforming any Claude model, and DeepSeek is getting equivalent scores (a few tenths of a percent difference)
> Do you think that Chinese labs will continue to release open models forever (...)
That's immaterial to the discussion. Even if China forced Chinese labs to restrict access to all models, the truth of the matter is that Trump's administration to restrict access to US-based models does not prevent others from having access to models that are as capable or even better.
So what's exactly the point of this?
You’re completely overrating these benchmarks and it’s landing you at a nonsense opinion. Just actually use the models and you will see that the gap is significant.
It should be easy for a company like Anthropic to prove this beyond a doubt. Why don't they? Why don't they have a collection of prompts and side-by-side comparisons with other models showing how far ahead they are?
I think it's mainly because the difference in models at the frontier isn't "response to prompt X", but rather "coherence with 500K tokens of context and instructions in play"
Good morning to the Anthropic office good sir
I got to try using Fable for a day... it was a clear and definite shift in quality and how independent it is.
It was almost like having another human using and shepherding Opus for me, instead of herding Opus directly myself.
All that says is some benchmarks aren’t worth the tokens it takes to evaluate them. Mythos is clearly capable of finding zero days other models can’t, and Fable is close enough to be lumped with it.
> Mythos is clearly capable of finding zero days other models can’t
I'm unconvinced that this is anything more than proof of work and marginal improvement that other models will catch up with, perhaps as early as to next week. Lots of other current-gen models will find vulns that can be chained together if you're willing to burn enough tokens on the task, and Fable is an absolute token incinerator.
Did you use the models yourself?
[dead]