Claude subscription is equivalant of spot instance
And APIs are on-demand service equivalant.
Priority is set to APIs and leftover compute is used by Subscription Plans.
When there is no capacity, subscriptions are routed to Highly Quantized cheaper models behind the scenes.
Selling subscription makes it cheaper to run such inference at scale otherwise many times your capacity is just sitting there idle.
Also, these subscription help you train your model further on predictable workflow (because the model creators also controls the Client like qwen code, claude code, anti gravity etc...)
This is probably why they will ban you for violating TOS that you cannot use their subscription service model with other tools.
They aren't just selling subscription, but the subscription cost also help them become better at the thing they are selling which is coding for coding models like Qwen, Claude etc...
I've used qwen code, codex and claude.
Codex is 2x better than Qwen code and Claude is 2x better than Codex.
So I'd hope the Claude Opus is atleast 4-5x more expensive to run than flagship Qwen Code model hosted by Alibaba.
> Claude is 2x better than Codex
This hasn't been true in a long time.
Not only that, but since the release of 5.4 and 5.3 codex I've been running them in parallel and I've been let down by Opus 4.6 with maximum thinking way more than I've been let down with OpenAI models.
In fact I'm more and more inclined to run my own benchmarks from now on, because I seriously distrust those I see online.
Even if the benchmarks are indeed valid, they just don't reflect my use cases, usages and ability to navigate my projects and my dependencies.
imho they're mostly better at a subset of different tasks. I find codex to be better at reasoning through bugs and reviewing code when compared to Opus, but for writing code I find Claude a lot better.
Maybe that's just CLAUDE.md and memory causing the difference of course.
As a matter of preference however I like the way Claude Code works just a lot better, instructing it to work with parallel subagents in work trees etc. just matches the way I think these things should work I guess.
My impression as well, especially since 5.2 which I felt was on par or better than Opus 4.5
> When there is no capacity, subscriptions are routed to Highly Quantized cheaper models behind the scenes.
Have they announced this?
> Have they announced this?
No and indeed they have said they never do this at all.
[dead]