Hacker News

Yeah this is why I ended up getting Claude subscription in the first place.

I was using GLM on ZAI coding plan (jerry rigged Claude Code for $3/month), but finding myself asking Sonnet to rewrite 90% of the code GLM was giving me. At some point I was like "what the hell am I doing" and just switched.

To clarify, the code I was getting before mostly worked, it was just a lot less pleasant to look at and work with. Might be a matter of taste, but I found it had a big impact on my morale and productivity.

Aurornis a day ago [ - ]

> but finding myself asking Sonnet to rewrite 90% of the code GLM was giving me. At some point I was like "what the hell am I doing" and just switched.

This is a very common sequence of events.

The frontier hosted models are so much better than everything else that it's not worth messing around with anything lesser if doing this professionally. The $20/month plans go a long way if context is managed carefully. For a professional developer or consultant, the $200/month plan is peanuts relative to compensation.

deaux a day ago [ - ]

Until last week, you would've been right. Kimi K2.5 is absolutely competitive for coding.

Unless you include it in "frontier", but that has usually been used to refer to "Big 3".

bigiain a day ago [ - ]

Looks like you need at least a quarter terabyte or so of ram to run that though?

(At todays ram prices upgrading to that for me would pay for a _lot_ of tokens...)

tkz1312 10 hours ago [ - ]

unfortunately running anything locally for serious personal use makes no financial sense at all right now.

4x rtx 6000 pro is probably the minimum you need to have something reasonable for coding work.

deaux 10 hours ago [ - ]

That's the setup you want for serious work yes, so probably $60kish all-in(?). Which is a big chunk of money for an individual, but potentially quite reasonable for a company. Being able to get effectively _frontier-level local performance_ for that money was completely unthinkable so far. Correct me if I'm wrong, but I think Deepseek R1 hardware requirements were far costlier on release, and it had a much bigger gap to market lead than Kimi K2.5. If this trend continues the big 3 are absolutely finished when it comes to enterprise and they'll only have consumer left. Altman and Amodei will be praying to the gods that China doesn't keep this rate of performance/$ improvement up while also releasing all as open weights.

tracker1 8 hours ago [ - ]

I'm not so sure on that... even if one $60k machine can handle the load of 5 developers at a time, you're still looking at 5 years of service to recoup $200/mo/dev and that doesn't even consider other improvements to hardware or the models service providers offer over that same period of time.

I'd probably rather save the capex, and use the rented service until something much more compelling comes along.

Aurornis a day ago [ - ]

> Kimi K2.5 is absolutely competitive for coding.

Kimi K2.5 is good, but it's still behind the main models like Claude's offerings and GPT-5.2. Yes, I know what the benchmarks say, but the benchmarks for open weight models have been overpromising for a long time and Kimi K2.5 is no exception.

Kimi K2.5 is also not something you can easily run locally without investing $5-10K or more. There are hosted options you can pay for, but like the parent commenter observed: By the time you're pinching pennies on LLM costs, what are you even achieving? I could see how it could make sense for students or people who aren't doing this professionally, but anyone doing this professionally really should skip straight to the best models available.

Unless you're billing hourly and looking for excuses to generate more work I guess?

deaux 21 hours ago [ - ]

I disagree, based on having used it extensively over the last week. I find it to be at least as strong as Sonnet 4.5 and 5.2-Codex on the majority of tasks, often better. Note that even among the big 3, each of them has a domain where they're better than the other two. It's not better than Codex (x-)high at debugging non-UI code - but neither is Opus or Gemini. It's not better than Gemini at UI design - but neither is Opus or Codex. It's not better than Opus at tool usage and delegation - but neither is Gemini or Codex.

ianlevesque 18 hours ago [ - ]

Yeah Kimi-K2.5 is the first open weights model that actually feels competitive with the closed models, and I've tried a lot of them now.

deaux 10 hours ago [ - ]

Same, I'm still not sure where it shines though. In each of the three big domains I named, the respective top performing closed model still seems to have the edge. That keeps me from reaching for it more often. Fantastic all-rounder for sure.

VladVladikoff 8 hours ago [ - ]

What hardware are you running it on?

triage8004 18 hours ago [ - ]

Disagree it's behind gpt top models. It's just slightly behind opus

miroljub 13 hours ago [ - ]

I've been using MiniMax-M2.1 lately. Although benchmarks show it comparable with Kimi 2.5 and Sonnet 4.5, I find it more pleasant to use.

I still have to occasionally switch to Opus in Opencode planning mode, but not having to rely on Sonnet anymore makes my Claude subscription last much longer.

bushbaba 20 hours ago [ - ]

For many companies. They’d be better to pay $200/month and layoff 1% of the workforce to pay for it.

apercu 14 hours ago [ - ]

The issue is they often choose the wrong 1%.

undeveloper 19 hours ago [ - ]

what tools / processes do you use to manage context

PeterStuer 17 hours ago [ - ]

My very first tests of local Qwen-coder-next yesterday found it quite capable of acceptably improving Python functions when given clear objectives.

I'm not looking for a vibe coding "one-shot" full project model. I'm not looking to replace GPT 5.2 or Opus 4.5. But having a local instance running some Ralph loop overnight on a specific aspect for the price of electricity is alluring.

davidwritesbugs 18 hours ago [ - ]

Similar experience to me. I tend to let glm-4.7 have a go at the problem then if it keeps having to try I'll switch to Sonnet or Opus to solve it. Glm is good for the low hanging fruit and planning

icedchai a day ago [ - ]

Same. I messed around with a bunch of local models on a box with 128GB of VRAM and the code quality was always meh. Local AI is a fun hobby though. But if you want to just get stuff done it’s not the way to go.

MuffinFlavored a day ago [ - ]

Did you eventually move to a $20/mo Claude plan, $100/mo Claude plan, $200/mo, or API based? if API based, how much are you averaging a month?

andai a day ago [ - ]

The $20 one, but it's hobby use for me, would probably need the $200 one if I was full time. Ran into the 5 hour limit in like 30 minutes the other day.

I've also been testing OpenClaw. It burned 8M tokens during my half hour of testing, which would have been like $50 with Opus on the API. (Which is why everyone was using it with the sub, until Anthropic apparently banned that.)

I was using GLM on Cerebras instead, so it was only $10 per half hour ;) Tried to get their Coding plan ("unlimited" for $50/mo) but sold out...

(My fallback is I got a whole year of GLM from ZAI for $20 for the year, it's just a bit too slow for interactive use.)

lostmsu 11 hours ago [ - ]

Try Codex. It's better (subjectively, but objectively they are in the same ballpark), and its $20 plan is way more generous. I can use gpt-5.2 on high (prefer overall smarter models to -codex coding ones) almost nonstop, sometimes a few in parallel before I hit any limits (if ever).

holoduke 17 hours ago [ - ]

I now have 3 x 100 plans. Only then I an able to full time use it. Otherwise I hit the limits. I am q heavy user. Often work on 5 apps at the same time.

auggierose 16 hours ago [ - ]

Shouldn't the 200 plan give you 4x?? Why 3 x 100 then?

holoduke 12 hours ago [ - ]

Good point. Need to look into that one. Pricing is also changing constantly with Claude