See here https://cursor.com/blog/composer-2-5

85% of the compute for the final model is from them, and not the base Kimi model.

That just means it cost a lot.

Does it perform meaningfully better than the Kimi model given all that extra compute? And proportionally to the amount spent?

That's something for us and benchmarks to decide

However it definitly isn't _just_ Kimi. The weight will be different after that 85% of extra training on top of the base model.

If those different weights are better are worse doesn't change that it's in most meaningful ways not the same as the base one.

I would encourage you to lookup their blog posts about their post training process if you want a bit more faith that they aren't running an extra 85% of compute and burning money with no-ops.

"Just Kimi" is hyperbole, to be clear.

I don't think it's all no-ops. Still don't think it's a particularly relevant model/company/product.

I'll defer the reading until I see signal that they have something worthwhile. I've watched a couple interviews and used the product, neither of which impressed me.

Cursor's Composer 2.5 is one of the few models out there focusing on coding, which is the one thing most of us here want. It's pretty good! It's not near frontier level insight generating genius, but it's regarded as very capable and trustable, and is indeed a lot better than previous Kimi. It'll be interesting to compare it versus Kimi 2.7 Code, which just dropped, which is also notably a coding specifical model. I'm expecting we'll see more of this over time and I think it has huge rewards, and Composer 2.5 is early proof.

I'm not super concerned about the spend to train the model, especially given that Kimi was famously incredibly cheaply made, and given what they are competing with. I don't think that's a meaningful concern.

Reciprocally, and in far more important relevant in my humble opinion: in terms of cost to run models: Composer 2.5 is easily one of the cheapest models out there. It's fantastically cheap. It's token efficiency is through the roof astronomical. I think this training for a coding specific model has yielded something incredibly special here, and I hope SpaceXLAIC isn't the only company doing this.