The take away is that this model is a smaller model that competes with Haiku, I would hope they come out with a "Sonnet" competing model, then Opus. I have been wondering why Microsoft is kind of "sleeping" on offering models they themselves have made on Copilot, maybe it was part of their deal with OpenAI? Not sure.
Yes, it's a "smaller" (137B) model that competes with Haiku, but it's basically the performance of Qwen3.6-35B-A3B which is 75% smaller and 98% smaller in terms of active parameters (since it's a mixture of experts model). Microsoft should be comparing its model to good smaller models, not Haiku 4.5.
Qwen-3.6-27b is closer to Claude Opus 4.7 than it is to Haiku 4.5 in a lot of benchmarks - and it's way smaller than Microsoft's new model.
Sure, it competes with Haiku, but it shows how far Microsoft is behind lots of other small models that are available.
> 98% smaller in terms of active parameters (since it's a mixture of experts model).
I don’t think that’s right, this flash model is 5B active params. Qwen3.6-35B-A3B is 3B so 40% smaller.
I understand what you’re saying, but I am generally very careful when comparing models and their benchmarks; benchmarks often don’t really match “real world” quality.
The technical report https://microsoft.ai/wp-content/uploads/2026/06/main_2026060... has a lot of detail about decontaminating their training data and developing new in-house benchmarks to ensure reliable evaluation. If other models were just overfit to public benchmarks while Microsoft produced something that generalizes better to unseen data, they could've used those in-house benchmarks to argue that point.
Instead, they only do cherry-picked comparisons against Anthropic's small models, and not the full spectrum of competitors.
Without evidence to the contrary, I'll interpret this as just what happens when you're late to the party and insist on doing everything from scratch.
Maybe coaxing reasoning behavior out of their base model without kickstarting it by distilling from existing models provided them with valuable experience that will help improve their future models, or maybe it was an unnecessary waste of time.
If their model was trained purely on properly licensed data, the reduced legal liability could be a selling point
[dead]
They did release, MAI-Thinking-1 to compete with Sonnet. Totally not sure why that isn't at the top here.
Can't yet use MAI-Thinking-1? [0] And no indication of it being made available in GitHub Copilot, either.
[0] Not even here: https://playground.microsoft.ai/
Good question, and I missed that entirely!
Compete? It is behind Kimi K2.6, which is in turn away behind Sonnet.