The harness matters far more than most people think. This post about the CORE benchmark where Opus’ score almost doubled when they switched to Claude Code from their own harness. https://x.com/sayashk/status/1996334941832089732

Mario, the creator of Pi terminal agent, has this great blog post[0]. He talks about how TerminalBench's highest scores comes from using the Terminus 2 harness which uses tmux under the hood.

When I was reading the Opus 4.6 launch post, they mentioned the same thing and their TerminalBench score was based on using Terminus 2 and not CC.

0. https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

Which, IMHO, should be why we should be able to change them freely or make our own. Being locked into a specific harness because you pay 20 bucks per month vs. pay-per-use ... is kinda dumb.

The reason Anthropic is pushing on the closed harness is that they're not confident with their ability to win on model quality long term, so they're trying to build lock-in. They can capture some additional telemetry owning the harness as well, but given the amount of data the agent loop already transmits, that borders on unethical spyware (which might be part of the reason they're afraid to open source).

Ultimately the market is going to force them to open up and let people flex their subs.

> Being locked into a specific harness because you pay 20 bucks per month vs. pay-per-use ... is kinda dumb.

I’ll probably get downvoted for this, but am I the only one who thinks it’s kind of wild how much anger is generated by these companies offering discounted plans for use with their tools?

At this point, there would be less anger and outrage on HN if they all just charged us the same high per-token rate and offered no discounts or flat rate plans.

No, you're not the only one. The outraged entitlement is pretty funny tbh. How dare they dictate that they'll only subsidize your usage if you use their software!!

I'm not outraged, but the dynamic creates a tension that prevents me from building brand loyalty.

Also another place where having it change out from underneath you can drastically alter the quality of your work in unexpected ways.

Like most things - assume the "20/100/200" dollar deals that are great now are going to go down the enshitification route very rapidly.

Even if the "limits" on them stay generous, the product will start shifting to prioritize things the user doesn't want.

Tool recommendations are my immediate and near term fear - paid placement for dev tools both at the model level and the harness level seem inevitable.

---

The right route is open models and open harnesses, ideally on local hardware.

The harness is effectively the agent's 'body'. Swapping the brain (model) is good, but if the body (tools/environment) is locked down or inefficient, the brain can't compensate. Local execution environments that standardize the tool interface are going to be critical for avoiding that lock-in.

At this point subsidizing Chinese open-weights vendors by paying for them is just the right thing to do. Maybe they too might go closed-weights when they become SotA, but they're now pretty close and haven't done it.

I am wondering what kinds of harness are best for GLM, Deepseek, Qwen, Kimi.

OpenCode is great in general. At least one of them is specifically trained on CC - I think it was Qwen - so for those that should give best results.

Claude Code better than opencode for GLM models for me.

OpenCode with Kimi has been great for me.

> Like most things - assume the "20/100/200" dollar deals that are great now are going to go down the enshitification route very rapidly.

I don’t assume this at all. In fact, the opposite has been happening in my experience: I try multiple providers at the same time and the $20/month plans have only been getting better with the model improvements and changes. The current ChatGPT $20/month plan goes a very long way even when I set it to “Extra High” whereas just 6 months ago I felt like the $20/month plans from major providers were an exercise in bouncing off rate limits for anything non-trivial.

Inference costs are only going to go down from here and models will only improve. I’ve been reading these warnings about the coming demise of AI plans for 1-2 years now, but the opposite keeps happening.

> Inference costs are only going to go down from here and models will only improve. I’ve been reading these warnings about the coming demise of AI plans for 1-2 years now, but the opposite keeps happening.

This time also crosses over with the frontier labs raising ever larger and larger rounds. If Anthropic IPO (which I honestly doubt), then we may get a better sense of actual prices in the market, as it's unlikely the markets will continue letting them spend more and more money each year without a return.

> The current ChatGPT $20/month plan goes a very long way

It sure does and Codex is great, but do you think they'll maintain the current prices after/if it eventually dominates Claude Code in terms of marketshare and mindshare?

I think we'll always have multiple options providing similar levels of service, like we do with Uber and Lyft.

Unlike Uber and Lyft, the price of inference continues to go down as datacenter capacity comes online and compute hardware gets more powerful.

So I think we'll always have affordable LLM services.

I do think the obsession with prices of the entry-level plans is a little odd. $20/month is nothing relative to the salaries people using these tools receive. HN is full of warnings that prices are going to go up in the future, but what's that going to change for software developers? Okay, so my $20/month plan goes to $40/month? $60/month? That's still less than I pay for internet access at home.