They work better for coding workloads. Essentially, the more regular the output, the more the faster model gets right, the less the big model has to do.

Writing tends to have more false positives. I haven't tried this particular one, however, but that is the general trend.