It's kind of funny that METR is known primarily for both the most bearish study on AI progress (the original 20% slowdown one), and the most bullish one on AI progress (the long-task horizon study showing exponential increase in duration of tasks AI models can accomplish with respect to date of release).

In either case, it seems people ended up bolstering their preexisting views on AI based on whichever study most affirmed them (for the former, that AI coding models didn't actually help and created a mirage of productivity that required more work to fix than was worth it, the latter that AI models were improving at an exponential rate and will invariably eclipse SWE's in all tasks in a deterministic amount of time.)

I think the truth is somewhere in the middle. Just anecdotally we've seen multi-million dollar fortunes being minted by small teams developing using 90% AI-assisted coding. Anthropic claims they solely use agents to code and don't modify any code manually.

> Anthropic claims they solely use agents to code and don't modify any code manually.

Have you used CC? It shows. They did not make their fortune off this, and it’s at least lost me a customer because of how sloppy it is. The model is good, and it’s why they have to gate access to it. I’d much rather use a different harness.

I do think you’re on to something though. As societal wealth further concentrates among the few, we’re going to get more and more slop for the rest of us because we have no money (relatively speaking). Agentic coding is here to stay because we as a society are forced more and more slop. It’s already rampant, this is just automating it.

...uh, I think Claude Code is great, actually. A lot of that is indeed just the strength of the underlying model, but the local client is great too. Plan mode, checkpoints, subagents... I've been using Claude Code for a year now, and I feel like Anthropic has steadily been eliminating pain points.

It's certainly a lot better than the Gemini cli!

Allow me a momentary rant...

I love Claude Code and use it all day, every day for work. I would self identify as an unofficial Claude Code evangelist amongst my coworkers and friends.

But Claude Code is buggy as hell. Flicker is still present. Plugin/skill configuration is an absolute shitshow. The docs are (very) outdated/incomplete. The docs are also poorly organized, embarrassingly so. I know Claude Code's feature set quite well, and I still have a hard time navigating their docs to find a particular thing sometimes. Did you know Claude Code supports "rules" (similar to the original Cursor Rules)? Find where they are documented, and tell me that's intuitive and discoverable. I'm sorry, but with an unlimited token (and I assume, by now, personnel) budget, there is no excuse for the literal inventors of Claude Code to have documentation this bad.

I seriously wish they would spend some more cycles on quality rather than continuing to push so many new features. I love new features, but when I can't even install a plugin properly (without manual file system manipulation) because the configuration system is so bugged, inscrutable, and incompletely documented, I think it's obvious that a rebalancing is needed. But then again, why bother if you're winning anyway?

Side note: comparing it to Gemini CLI is simply cruel. No one should ever have to use or think about Gemini CLI.

Functionality-wise, it's great, but it's a buggy mess, and it seems to be getting worse with each release.

I've been using deletated Claude agents in vscode and it crashes so much it's insane... I switched to copilot Claude local agents and it works much better.

Idk about this whole vibe coding thing though... Well see what happens

I’m a heavy user for about four months now, and it’s definitely getting better for me. How would you say it’s getting worse?