Trustworthy vibe coding. Much better than the other kind!
Not sure I really understand the comparisons though. They emphasize the cost savings relative to Haiku, but Haiku kinda sucks at this task, and Leanstral is worse? If you're optimizing for correctness, why would "yeah it sucks but it's 10 times cheaper" be relevant? Or am I misunderstanding something?
On the promising side, Opus doesn't look great at this benchmark either — maybe we can get better than Opus results by scaling this up. I guess that's the takeaway here.
I also don't understand the focus on vibe coding in the marketing. Vibe coding kind of has the image of being for non-devs, right?
I do like agents (like Claude Code), but I don't consider myself to be vibe coding when I use them. Either I'm using a language/framework I know and check every step. OR I'm learning, checking every step and asking for explanations.
I tried vibe coding, and really dislike the feeling I have when doing it. It feels like building a house, but without caring about it, and just using whatever tech. Sure I may have moisture problems later, but it's a throwaway house anyway. That's how I feel about it. Maybe I have a wrong definition.
Maybe it's good to not use "vibe coding" as a synonym for programming with agent assistance. Just to protect our profession. Like: "Ah you're vibing" (because you have Claude Code open), "No, I'm using CC to essentially type faster and prevent syntax errors and get better test coverage, maybe to get some smart solutions without deep research. But I understand and vouch for every loc here. 'We are not the same.'"
> I tried vibe coding, and really dislike the feeling I have when doing it. It feels like building a house, but without caring about it, and just using whatever tech. Sure I may have moisture problems later, but it's a throwaway house anyway. That's how I feel about it. Maybe I have a wrong definition.
No, I feel the same. I vibe-coded a few projects and after a few weeks I just threw them away, ultimately I felt I just wasted my time and wished I coudl get it back to do something useful.
Yeah, the original meaning of Vibe Coding was "not looking at the code, just going on vibes", but a lot of people now use it to mean "AI was involved in some way".
I see a whole spectrum between those two. I typically alternate between "writing code manually and asking AI for code examples" (ChatGPT coding), and "giving AI specific instructions like, write a function blarg that does foo".
The latter I call Power Coding, in the sense of power armor, because you're still in control and mostly moving manually, but you're much stronger and faster.
I like this better than "tell agent to make a bunch of changes and come back later" because first of all it doesn't break flow (you can use a smaller model for such fine-grained changes so it goes very fast -- it's "realtime"), and second, you don't ever desync from the codebase and need to spend extra time figuring out what the AI did. Each change is sanity-checked as it comes in.
So you stay active, and the code stays slop-free.
I don't hear a lot of people doing this though? Maybe we just don't have good language for it.
"I don't hear a lot of people doing this though? Maybe we just don't have good language for it."
Interesting thought. I guess we don't really, vibe coding is to powerful a term. But perhaps just call it LLM assisted programming? Where we used to do Stack Overflow assisted programming. LLM assisted programming is more focused, goes faster. But since you're wandering around less I guess you learn less, you're exposed to less new information, some of it was helpful in unexpected ways. Now you have to make learning a specific part of your flow, and that takes discipline/time. But is well worth it imho. Actually, for me it's the only way to enjoy it.
> It feels like building a house, but without caring about it, and just using whatever tech.
So, most homebuilders (in the US) unfortunately.
I myself am now and expert at insulation and all the vapor-permeable and vapor-blocking membranes/foils/foams that come with it.
It came at great cost though, I hated the process of learning and the execution. I was less than happy for some years. But I feel even more uncomfortable vibe-home-improving than I do vibe-coding. The place is starting to look nice now though.
They haven't made the chart very clear, but it seems it has configurable passes and at 2 passes it's better than Haiku and Sonnet and at 16 passes starts closing in on Opus although it's not quite there, while consistently being less expensive than Sonnet.
pass@k means that you run the model k times and give it a pass if any of the answers is correct. I guess Lean is one of the few use cases where pass@k actually makes sense, since you can automatically validate correctness.
Oh my bad. I'm not sure how that works in practice. Do you just keep running it until the tests pass? I guess with formal verification you can run it as many times as you need, right?
It’s really not hard — just explicitly ask for trustworthy outputs only in your prompt, and Bob’s your uncle.
Assuming that what you're dealing with is assertable. I guess what I mean to say is that in some situations is difficult to articulate what is correct and what isn't depending in some situations is difficult to articulate what is correct and what isn't depending upon the situation in which the software executes.
And Bob’s your uncle.