Comparing open-source models like Qwen against Anthropic’s models is absolutely foolish. First of all, Anthropic has never disclosed the actual parameter count or architecture of their models. Second, it’s well known that these open-source models more or less distill from other models and use MoE, which allows them to run at much lower computational costs. Using Qwen as a comparison point only proves the blog post author is foolish. The article devoted such a large portion to discussing Qwen on OpenRouter, I find it hard to believe.

Anthropic is obviously also aware of the benefits of MoE and distilling a larger model into a smaller one, so they could run a model of the same size as Alibaba's for the same inference cost if they want to. Or they can run a slightly larger model for slightly higher cost. They definitely aren't running a much larger model (except potentially as a teacher for distillation training) because then they wouldn't be able to hit the output speeds they're hitting.