At least according to gertlabs, Qwen3.6 27B outperforms every SoTA (closed) model at Kotlin: https://archive.vn/RYBCL / https://gertlabs.com/rankings?mode=agentic_coding&language=k...
At least according to gertlabs, Qwen3.6 27B outperforms every SoTA (closed) model at Kotlin: https://archive.vn/RYBCL / https://gertlabs.com/rankings?mode=agentic_coding&language=k...
Interesting. I wonder if there is opportunity to train a set of small model variants to excel at a certain stacks. Eg Qwen3.6-27B for Node + React or Qwen3.6-27B for Rust + TUI
This is always how I've imagined small/consumer-hardware models going in time. If I only ever code in Python, give me a model that does just that (plus some general CS, algorithms, structure, etc.) and does it super-fast and well. Make it small enough that if I need a Python back end and an HTML front end, another specific model can load alongside and collaborate on the front end.
Or give me a pure shopping model that has a general understanding of products and product categories, and then will playwright/scrape/API into shopping sites to compare options and find me what I want. Etc.
Qwen 3.6 27B is an anomalously strong all-around model for its size, but when we run our evaluations, we generate 10 coding submissions/language/model (110 total). So full discosure, the per-language per-model performances can be noisy (I do not think Qwen3.6 27B is better than Fable 5 in agentic workflows when writing Kotlin, given enough samples, although we do find some interesting anomalies that hold up under large sample sizes).