> If even Anthropic, the company with the world's best agentic vibecoders...
But that's really not what they have. They have AI experts who are creating incredible LLMs.
Everything else is more than meh: Claude Code is really bad. Such a turd would never have gained any traction if it wasn't for the LLMs behind it.
I use LLMs to code daily (Claude Code still, mind you, for I didn't take the time to switch yet) and these modesl are both amazing and pathetic.
If you don't verify everything they output, they do the absolute craziest thing imaginable.
One example is I got an Anthropic model notice a "pattern" in range bound integer values. I had them range bound between, e.g., 0xCAFE0000 and 0xCAFEFFFF. And at some point a comparison/validation was needed and instead of doing an integer comparison the Anthropic model went ballistic: instead of doing an integer comparison it converted the numbers to a string, then started doing substring matching on "0xCAFE" and went even more "expert" by verifying at which position the match was happening. All that while explaining why it couldn't possibly fail.
Why did it do that? Very likely because, in a comment, it saw "0xCAFE..." as a string. And the thing saw a pattern.
Can you believe it? There's a pattern. So it must light up connections. We've got a pattern!
Now amount of kludge, hidden pre-processing, hidden post-processing is fixing the "quality" of the code produced by something that, instead of doing an integer comparison, converts things to string and then does substring searches and indexes computation.
There's no fixing that.
Yesterday: had to use three guard clauses before pushing data... Two of the three "logic gates" (as the model would explain they were, which is kinda right) he got right. The third one: same thing... It was planning to go ballistic, introduce countless lines of code, insane abstractions, to make a test that was solved with a one line timestamp comparison.
It's because it does things like that that the people who explain that they don't code anymore are delusional if they think this gives, as of today, quality code.
It's like that other dude who was happy to produce 37 K LOC per day and counting.
> ... it really says something about the quality of the world's best agentically produced code
Oh it is totally shit code. But if you monitor everything and vet everything they do, it's helpful.
I find these LLMs way more helpful at finding the source of bugs (not fixing them: finding them, which is 90% of the job anyway) and at acting like rubber-ducks then at writing code.
Claude Code sucks. Claude Code CLI sucks. Their only "solutions" to all problems is to create VMs, headless browsers, and resort to incredible hacks (the infamous "game loop" that modifies the characters output by the LLM is just shameful) etc. to try to hide the misery. It's miserable kludges everywhere.
And the only reason these miserable kludges are not entirely falling apart is because they rest on the shoulders of actual giants: projects like Linux, QEMU, etc. that were not vibe-coded.
It's sad to have useful tools (the models) and to make such poor use of them.
I'm pretty sure that, in the end, it's just like open-source powering the entire world by now: we'll have open-source projects like Pi and then newer ones that are going to come out and fix the mess we have now. And they're not going to be 100% vibe-coded by people whose jobs is "to write loops".