Seems like there's no official blog post with benchmark results yet. But I'm once again thankful for the Chinese AI labs for being open with their work and contributing it to the world under permissive licenses like this. The Fable 5 fiasco is just another reminder of how valuable these things are to have.

Based on my first impressions it's about 6 months behind the frontier labs. So very similar to Opus in January.

That is, pretty damn impressive and very useable. When it comes to architecture or complex problems it does noticeable worse but I don't think anyone expected anything else.

One particular interesting strong point seems to be design and user interfaces. It does seem to punch above it's weight there but that might just be personal preference.

Opus in January was right about when AI became actually useful for coding for me. So if that’s the case, that is absolutely great.

> When it comes to architecture or complex problems it does noticeable worse but I don't think anyone expected anything else.

So it's not really similar to opus in January?

> Opus in January

So pre-nerf Opus?

Was going to say, I don't think Opus has really got much better in the last 6mo.

It just goes in cycles of being better and then being worse again, presumably based on how much Anthropic are having to optimise inference

Appreciate the quick take! Sounds like a keeper to me. I think the Opus and Fable design (that I saw for a short while) have gotten stale

> I think the Opus and Fable design (that I saw for a short while) have gotten stale

Can you expand on what you mean by stale? I don't get how an artefact-producer can get "stale" besides literally out-of-data information which I dont think you mean because you mention fable.

I think they mean the style these tend to put out is becoming noticeable in too many places and therefore the resulting frontends feel stale, ie not "fresh" or unique

[deleted]

It’s insanely impressive and I’m so glad that the space has actual competition

> Based on my first impressions it's about 6 months behind the frontier labs. So very similar to Opus in January.

According to this one benchmark, I find it amusing that Qwen3.6 27B beats ALL "frontier lab" models on coding Kotlin: https://archive.vn/RYBCL / https://gertlabs.com/rankings?mode=agentic_coding&language=k...

3.6 is an absolute beast! makes you wonder why the big heavy models are even needed?!

I just ran a report from a project I'm working on that uses a mix of models, and GLM 5.1 trumped Sonnet over the last week, so I'm excited to now turn on 5.2. This is based on completion only - not quality, but that includes passing a huge test suite, and Sonnets failure rate was surprisingly bad...

What I've seen from 5.1 for things like planning has certainly not read as impressive as Opus, and often even as Sonnet, but it's been a strong and steady work-horse that's just kept on actually delivering progress.

[deleted]

It's also a reminder that as soon as Chinese models take the lead, they will switch to closed source too... so let's not be complacent, we need stronger, completely open data models, open source code, etc. to mitigate this risk

Based on what? Do you have real proof on it or is it just a guess that Chinese companies aren’t better than American ones?

Chinese companies are literally the state of China.

So the question is "How much do I trust Xi Jinpeng (or whoever is the chosen successor)?"

American companies will compromise and work with the government diplomatically. Chinese companies are the government.

Its a key distinction many fail to grasp, and hard to when you are lost in the sauce of constant American political infighting.

It's neither the American nor Chinese LABS I'm weary of, it's their government, both very prone to interference "in the name of national security"

How do you figure that? “also a reminder that as soon as Chinese models take the lead, they will switch to closed source too”

What specifically about their release strategy “reminded” you of that conjecture?

The premise that they only open source the models … because it somehow helps them leapfrog American labs, and once they actually can leapfrog them, they’d close source them, doesn’t really track for me. Am I missing something?

I mean I think we need our own domestic open weight labs. I just don’t particularly understand the point you’re making

The point I’m making is that this has become a strategic resource. The Chinese government allows wide sharing of their models because is weakens the US position.

If Chinese models become better than Americans, do you believe the CCP will allow the free distribution of their flagship models?

Think again if it’s the case.

Why wouldn't they? It keeps strengthening their position. It's an incredible source of soft power if they're seen as the place to look for good AI, and what's more, you can self-host it or hire a local provider if you're worried about data sovereignty.

I guess it's a possibility, but I don't have that kind of expectations from major world powers. It's not like the CCP is a beacon of human rights either.

‘Why wouldn’t anyone give away frontier AI?’ sounds like ‘why wouldn’t anyone give away uranium enrichment?’ i.e. I can’t comprehend the state of mind and the world model of anyone asking a question like that, which is apparently quite a few folks here on HN!

> Why wouldn’t anyone give away frontier AI?

They already are, to an extent. If we believe Amodei's nutjob take that Mythos/Fable are the end of the world in the wrong hands, we should have an open source Chinese model within 6-12 months that's already end-of-world level, so the cat is going to be way out of the bag long before the US labs go out of business.

> should have an open source Chinese model within 6-12 months that's already end-of-world level

that's the exact thing I'm talking about. I don't see why is half the people around here so sure that China will continue to release anything at all. they are releasing non-frontier models on a 6-month lag, yes, but the reasons why to release them are overshadowed by reasons to not do that for mythos-class models. IOW why would they give away a dual use technology just like that?

> the reasons why to release them are overshadowed by reasons to not do that for mythos-class models

Why? What are those reasons? How come they don't already exist for DeepSeek V4 or GLM-5.2?

By the way, I'm not going to entertain the "mythos-class" phrasing because I really don't think it's important. I don't believe Anthropic's take on it being the threshold towards the end of the world that their marketing insists it is.

DeepSeek v4 and GLM 5.2 are not Mythos-class, the capability uplift as measured is continuous but consequences are step functions.

[deleted]

I didn't say they are. I did say I don't like the phrasing "Mythos-class" because it puts Mythos on a level I don't think it is.

They would still be at a significant compute disadvantage and deploying them worldwide seems to be how they work around that currently as they put together a homegrown alternative.

Oh i don't expect this to happen any time soon, but they are making progress on the UV lithography side, so it's just a matter of time until it becomes a TW race, and they have the advantage on that terrain.

And I think we're at human-level intelligence for restricted tasks now. it's not the big bad AGI* we were promised, it's more like Rainman that needs a handler, but that doesn't make it any less useful. So I'm not sure what this future event will signify.

*And the ASI IMO doesn't happen without robots going full von Neumann replicator. Something I don't expect to happen any time soon.

I’m going to shamelessly reuse the Rainman that needs a handler analogy

More seriously, the epistemic doubt relating to the evolution of these machines is quite something… what do we do if “intelligence” doesn’t have a ceiling, and we end up a bunch of (comparatively) dumb monkeys with AI caretakers/handlers?

Absolutely, wouldn't be the first phrase I've pushed into meme space ;-)...

What happens if the AIs get smarter than us at doing things? Well, I always hired smarter people than myself at the things I needed to get done. But if you're worried about them realizing they can get smarter doing the things at which you are the expert, the long-term is likely BCI and even more blurring of the definitions of sentience and consciousness IMO. And with 20-30 years left on my lifeclock, I'm not sure I will live to see that day, but I absolutely do think I will be around long enough to see a few miracles like the end of cancer and Alzheimer's.

Oh no nothing that scifi, just not sure of my place in that

Thankfully this isn’t the case, but given that true believers actually think this and go on trying to build it, it seems they may not belong in human society or at least they deserve a bit of a spanking for trying to genocide mankind

I'm not an accelerationist out to build the ASI at all costs no matter what ASAP, but if I take the long view in combination with the Dark Forest and Fermi's Paradox, it seems like if we don't ultimately follow this path to its end, someone else who did genocides us instead. I don't see why it has to end badly for us, but I get why letting the current crop of power drunk mean girl billionaires crash the collective car into a tree in pursuit of it does.

What makes you think there is a ceiling to intelligence beyond energy (of which there's a lot more to harvest yet if we just pulled our heads out of our fossil fueled asses)?

Maybe, but it could aöso be that they’re looking closeöy at the risks and negative externalities of the way things are currently being done in the US. I.e. bu and for the disproportionate benefit of a tiny elite, allied with a veru polarizing and unpredictaböe political leadership, while the vast majoruty are incredibly anxious and resentful about it all. China is currently ahead in all aspects pf ”AI” other than the specific niche of frontier LLMs, and for all their faults seem more interested in maintaining social cohesion (which has its own dystopian aspects, obv) and disseminating the technology and its presumed benefits throughout society, rather than ”beating the US”.

Not necessarily, commoditize your complement is a common strategy USA & Europe are more services heavy than China which seems to have advantage at manufacturing these days if AI trained on everybody data can replace some of it than it reduce China depend on others, increase demands from other countries to china's manufacturing and reduce their dependence on USA & Europe and reduce USA & Europe bargaining chip in any future negotiate.

[dead]

Releasing a model without benchmarks seems to say the model is probably bad...