My experience is that open weight models from China are at least ~12 months behind. In some workloads they may be closer, in others further away.

I also find that the harness and product you wrap around models can often narrow that gap considerably.

Opus 4.6 for example, on a PR-for-PR basis was head and shoulders above GLM 5.1. Perhaps GLM 5.1 was a bit under Sonnet 4.6 at the time. That's roughly a year or so behind.

Much cheaper though! I'm bullish on open weight models, I have no idea where all these curves will top out, can the frontier labs keep the year plus lead? Do open labs get close enough to SOTA that they gain adoption across many tasks and drive down inference prices??? Who knows, not me.