Hacker News

Opus 4.6 likely has in the order of 100B active parameters. OpenRouter lists the following throughput for Google Vertex:

    42 tps for Claude Opus 4.6 https://openrouter.ai/anthropic/claude-opus-4.6
    143 tps for GLM 4.7 (32B active parameters) https://openrouter.ai/z-ai/glm-4.7
    70 tps for Llama 3.3 70B (dense model) https://openrouter.ai/meta-llama/llama-3.3-70b-instruct

For GLM 4.7, that makes 143 * 32B = 4576B parameters per second, and for Llama 3.3, we get 70 * 70B = 4900B, which makes sense since denser models are easier to optimize. As a lower bound, we get 4576B / 42 ≈ 109B active parameters for Opus 4.6. (This makes the assumption that all three models use the same number of bits per parameter and run on the same hardware.)