Hacker News

Is claude code the best coding harness? Anyone running evals on that?

In my anecdotal experience, it is not. Same model, opus, works better in 3P harnesses such as Factory Droid or Amp.

Claude code, on the other hand, is the most subsidized one, both for consumers (through max subscription) and for enterprises (token discounts). It is also heavily optimized for cost, specially token caching and reduced thinking, at the expense of quality.

viking123 30 minutes ago [ - ]

codex is way more subsidized currently, much more generous limits even for 20 dollars a month

jedisct1 an hour ago [ - ]

Ironically, there are plenty of evals showing that it’s not actually that great. Even with Anthropic models, other harnesses are more efficient, both in terms of the number of problems solved and token usage.

Significant regressions also seem to be introduced from time to time after releases.

The UX is great, and if you need a kitchen sink packed with tons of features, even though you’ll probably only end up using a fraction of them, it’s fine.

But if you want something that performs well, you’re better off using something like Opencode or Swival.dev

DeathArrow 7 hours ago [ - ]

Terminal Bench is testing agent harness.

The best two are Codex and Forge Code.

However I am using plugins and skills that are only compatible with Claude Code or work best with Claude Code.

So, for me, Claude Code with plugins like claude-meme, Context Mode, Superpowers and Get Shit Done is better than other tools.

I think everyone should test multiple models and multiple agent harness for his specific needs, codebase and way of working.