Do you have a feel for how it Qwen 3.6 compares to Sonnet 4.6? B/C in reality, that's what we use a lot. If we just use Opus 4.7 for everything code related, we'd have a monthly bill 10-20 times higher than using Sonnet where we can.
Do you have a feel for how it Qwen 3.6 compares to Sonnet 4.6? B/C in reality, that's what we use a lot. If we just use Opus 4.7 for everything code related, we'd have a monthly bill 10-20 times higher than using Sonnet where we can.
I think you could well be surprised by the Sonnet vs Opus bill (assuming you are paying via the API)
In my experience Sonnet bills can be higher than Opus because it churns a lot more trying to get things right.
Example from my fairly simple but agentic benchmark:
Opus 4.7, 25/25, 81c: https://sql-benchmark.nicklothian.com/?highlight=anthropic_c...
Opus 4.6, 24/25, 61c: https://sql-benchmark.nicklothian.com/?highlight=anthropic_c...
Sonnet 4.6: 24/25, 41c: https://sql-benchmark.nicklothian.com/?highlight=anthropic_c...
I only tested the free OpenRouter version of Qwen 3.6 Plus, and it scored 23/25: https://sql-benchmark.nicklothian.com/?highlight=qwen_qwen3....
This doesn't quite show Opus cheaper, but it isn't the 10-20 times more either. Harder tasks close the gap even further.
I would say if Sonnet is a senior engineer, then Qwen3.6 (the 27b model) is probably closer to a junior engineer. Still capable of getting stuff done, just needs more guidance and makes mistakes more often.
Maybe that's underselling it. It is quite a good model and might end up replacing a lot of the work I was sending to Sonnet 4.6.
Also, Sonnet 4.6 is almost certain a much bigger model so the performance differences aren't unexpected.