> but finding myself asking Sonnet to rewrite 90% of the code GLM was giving me. At some point I was like "what the hell am I doing" and just switched.

This is a very common sequence of events.

The frontier hosted models are so much better than everything else that it's not worth messing around with anything lesser if doing this professionally. The $20/month plans go a long way if context is managed carefully. For a professional developer or consultant, the $200/month plan is peanuts relative to compensation.

Until last week, you would've been right. Kimi K2.5 is absolutely competitive for coding.

Unless you include it in "frontier", but that has usually been used to refer to "Big 3".

Looks like you need at least a quarter terabyte or so of ram to run that though?

(At todays ram prices upgrading to that for me would pay for a _lot_ of tokens...)

unfortunately running anything locally for serious personal use makes no financial sense at all right now.

4x rtx 6000 pro is probably the minimum you need to have something reasonable for coding work.

That's the setup you want for serious work yes, so probably $60kish all-in(?). Which is a big chunk of money for an individual, but potentially quite reasonable for a company. Being able to get effectively _frontier-level local performance_ for that money was completely unthinkable so far. Correct me if I'm wrong, but I think Deepseek R1 hardware requirements were far costlier on release, and it had a much bigger gap to market lead than Kimi K2.5. If this trend continues the big 3 are absolutely finished when it comes to enterprise and they'll only have consumer left. Altman and Amodei will be praying to the gods that China doesn't keep this rate of performance/$ improvement up while also releasing all as open weights.

I'm not so sure on that... even if one $60k machine can handle the load of 5 developers at a time, you're still looking at 5 years of service to recoup $200/mo/dev and that doesn't even consider other improvements to hardware or the models service providers offer over that same period of time.

I'd probably rather save the capex, and use the rented service until something much more compelling comes along.

> Kimi K2.5 is absolutely competitive for coding.

Kimi K2.5 is good, but it's still behind the main models like Claude's offerings and GPT-5.2. Yes, I know what the benchmarks say, but the benchmarks for open weight models have been overpromising for a long time and Kimi K2.5 is no exception.

Kimi K2.5 is also not something you can easily run locally without investing $5-10K or more. There are hosted options you can pay for, but like the parent commenter observed: By the time you're pinching pennies on LLM costs, what are you even achieving? I could see how it could make sense for students or people who aren't doing this professionally, but anyone doing this professionally really should skip straight to the best models available.

Unless you're billing hourly and looking for excuses to generate more work I guess?

I disagree, based on having used it extensively over the last week. I find it to be at least as strong as Sonnet 4.5 and 5.2-Codex on the majority of tasks, often better. Note that even among the big 3, each of them has a domain where they're better than the other two. It's not better than Codex (x-)high at debugging non-UI code - but neither is Opus or Gemini. It's not better than Gemini at UI design - but neither is Opus or Codex. It's not better than Opus at tool usage and delegation - but neither is Gemini or Codex.

Yeah Kimi-K2.5 is the first open weights model that actually feels competitive with the closed models, and I've tried a lot of them now.

Same, I'm still not sure where it shines though. In each of the three big domains I named, the respective top performing closed model still seems to have the edge. That keeps me from reaching for it more often. Fantastic all-rounder for sure.

What hardware are you running it on?

Disagree it's behind gpt top models. It's just slightly behind opus

I've been using MiniMax-M2.1 lately. Although benchmarks show it comparable with Kimi 2.5 and Sonnet 4.5, I find it more pleasant to use.

I still have to occasionally switch to Opus in Opencode planning mode, but not having to rely on Sonnet anymore makes my Claude subscription last much longer.

For many companies. They’d be better to pay $200/month and layoff 1% of the workforce to pay for it.

The issue is they often choose the wrong 1%.

what tools / processes do you use to manage context