In my case, I’m paying for inference on the original models from e.g. Fireworks. So it’s not a quantization problem. The Qwen3 I was using was the new 458B (i think that’s the size?) model that was their top performer for code.

I agree with other comments that there are productive uses for them. Just not on the scale of o4-mini/o3/claude 4 sonnet/opus.

So imo open weights larger models from big US labs is a big deal! Glad to see it. Gemma models, for example, are great for their size. They’re just quite small.