Hacker News

The take away is that this model is a smaller model that competes with Haiku, I would hope they come out with a "Sonnet" competing model, then Opus. I have been wondering why Microsoft is kind of "sleeping" on offering models they themselves have made on Copilot, maybe it was part of their deal with OpenAI? Not sure.

mdasen a day ago [ - ]

Yes, it's a "smaller" (137B) model that competes with Haiku, but it's basically the performance of Qwen3.6-35B-A3B which is 75% smaller and 98% smaller in terms of active parameters (since it's a mixture of experts model). Microsoft should be comparing its model to good smaller models, not Haiku 4.5.

Qwen-3.6-27b is closer to Claude Opus 4.7 than it is to Haiku 4.5 in a lot of benchmarks - and it's way smaller than Microsoft's new model.

Sure, it competes with Haiku, but it shows how far Microsoft is behind lots of other small models that are available.

IanCal 11 hours ago [ - ]

> 98% smaller in terms of active parameters (since it's a mixture of experts model).

I don’t think that’s right, this flash model is 5B active params. Qwen3.6-35B-A3B is 3B so 40% smaller.

stingraycharles a day ago [ - ]

I understand what you’re saying, but I am generally very careful when comparing models and their benchmarks; benchmarks often don’t really match “real world” quality.

yorwba 11 hours ago [ - ]

The technical report https://microsoft.ai/wp-content/uploads/2026/06/main_2026060... has a lot of detail about decontaminating their training data and developing new in-house benchmarks to ensure reliable evaluation. If other models were just overfit to public benchmarks while Microsoft produced something that generalizes better to unseen data, they could've used those in-house benchmarks to argue that point.

Instead, they only do cherry-picked comparisons against Anthropic's small models, and not the full spectrum of competitors.

Without evidence to the contrary, I'll interpret this as just what happens when you're late to the party and insist on doing everything from scratch.

Maybe coaxing reasoning behavior out of their base model without kickstarting it by distilling from existing models provided them with valuable experience that will help improve their future models, or maybe it was an unnecessary waste of time.

fmajid 6 hours ago [ - ]

If their model was trained purely on properly licensed data, the reduced legal liability could be a selling point

davecitron 13 hours ago [ - ]

[dead]

minraws a day ago [ - ]

They did release, MAI-Thinking-1 to compete with Sonnet. Totally not sure why that isn't at the top here.

ignoramous 9 hours ago [ - ]

Can't yet use MAI-Thinking-1? [0] And no indication of it being made available in GitHub Copilot, either.

[0] Not even here: https://playground.microsoft.ai/

giancarlostoro a day ago [ - ]

Good question, and I missed that entirely!

lostmsu 20 hours ago [ - ]

Compete? It is behind Kimi K2.6, which is in turn away behind Sonnet.