One might argue that it’s not too too different from higher level abstractions when using libraries. You get things done faster, write less code, library handles some internal state/memory management for you.

Would one be uneasy about calling a library to do stuff than manually messing around with pointers and malloc()? For some, yes. For others, it’s a bit freeing as you can do more high-level architecture without getting mired and context switched from low level nuances.

I see this comparison made constantly and for me it misses the mark.

When you use abstractions you are still deterministically creating something you understand in depth with individual pieces you understand.

When you vibe something you understand only the prompt that started it and whether or not it spits out what you were expecting.

Hence feeling lost when you suddenly lose access to frontier models and take a look at your code for the first time.

I’m not saying that’s necessarily always bad, just that the abstraction argument is wrong.

I think it's more: when I don't have access to a compiler I am useless. It's better to go for a walk than learn assembly. AI agents turn our high-level language into code, with various hints, much like the compiler.

If my compiler "went down" I could still think through the problem I was trying to solve, maybe even work out the code on paper. I could reach a point where I would be fairly confident that I had the problem solved, even though I lacked the ability to actually implement the solution.

If my LLM goes down, I have nothing. I guess I could imagine prompts that might get it to do what I want, but there's no guarantee that those would work once it's available again. No amount of thought on my part will get me any closer to the solution, if I'm relying on the LLM as my "compiler".

What stops you from thinking through the problem if an LLM goes down, as you still have its previously produced code in front of you? It's worse if a compiler goes down because you can't even build the program to begin with.

In my opinion, this sort of learned helplessness is harmful for engineers as a whole.

Yeah I actually find writing the prompt itself to be such a useful mechanism of thinking through problems that I will not-infrequently find myself a couple of paragraphs in and decide to just delete everything I've written and take a new tack. Only when you're truly outsourcing your thinking to the AI will you run into the situation that the LLM being down means you can't actually work at all.

An interesting element here, I think, is that writing has always been a good way to force you to organize and confront your thoughts. I've liked working on writing-heavy projects, but often in fast-moving environments writing things out before coding becomes easy to skip over, but working with LLMs has sort of inverted that. You have to write to produce code with AI (usually, at least), and the more clarity of thought you put into the writing the better the outcomes (usually).

Why couldn’t you actually write out the documents and think through the problem? I think my interaction is inverted from yours. I have way more thinking and writing I can do to prep an agent than I can a compiler and it’s more valuable for the final output.

I think if you're vibe coding to the extent that you don't even know the shapes of data your system works with (e.g. the schema if you use a database) you might be outsourcing a bit too much of your thinking.

This. When compilers came along, I believe a bunch of junior engineers just gave up utterly on understanding the shape of how the code was generated in assembly which was a mistake given early compilers weren't as effective as they are today. Today vibe-coders are using these early AI tooling and giving up on understanding the shape, and similarly struggling.

If your compiler produced working executable 20% of the time this would be an apt comparison.

Still misses the mark. You aren’t useless in the same way because you are still in control of reasoning about the exact code even if you never actually write it.

Compilers are deterministic, LLMs are not. They are not "much like".

The difference is that there is a company that can easily take your agents away from you.

Installed on your machine vs. cloud service that's struggling to maintain capacity is an unfair comparison...

> you are still deterministically creating something you understand in depth with individual pieces you understand

You’re overestimating determinism. In practice most of our code is written such that it works most of the time. This is why we have bugs in the best and most critical software.

I used to think that being able to write a deterministic hello world app translates to writing deterministic larger system. It’s not true. Humans make mistakes. From an executives point of view you have humans who make mistakes and agents who make mistakes.

Self driving cars don’t need to be perfect they just need to make fewer mistakes.

Bugs are not non-determinism. There’s a huge difference between writing buggy code and having no idea what the code even looks like.

"When you use abstractions you are still deterministically creating something you understand in depth with individual pieces you understand."

I always thought the point of abstraction is that you can black-box it via an interface. Understanding it "in depth" is a distraction or obstacle to successful abstraction.

[deleted]

> When you use abstractions you are still deterministically creating something you understand in depth with individual pieces you understand

Hard disagree on that second part. Take something like using a library to make an HTTP call. I think there are plenty of engineers who have more than a cursory understanding of what's actually going on under the hood.

It might just be social. When I use the open source http library, much of the reason I use it is because someone has put in the work of making sure it actually works across a diverse set of software and hardware platforms, catching common dumb off by ones, etc.

Sure, the LLM theoretically can write perfect code. Just like you could theoretically write perfect code. In real life though, maintenance is a huge issue

Perhaps then, the better analogy is like being promoted at your company and having people under you doing the grunt work.

How closely you micromanage it is a factor as well though

This is how I’ve come to think of it. Delegation of the details.

It seems like some kind of technique is needed that maximizes information transfer between huge LLM generated codebases and a human trying to make sense of them. Something beyond just deep diving into the codebase with no documentation.

There's a false dichotomy here between 'deterministic creation' and 'vibing'.

I use Claude all day. It has written, under my close supervision¹, the majority of my new web app. As a result I estimate the process took 10x less time than had I not used Claude, and I estimate the code to be 5x better quality (as I am a frankly mediocre developer).

But I understand what the code does. It's just Astro and TypeScript. It's not magic. I understand the entire thing; not just 'the prompt that started it'.

¹I never fire-and-forget. I prompt-and-watch. Opus 4.7 still needs to be monitored.

In what world to developers “understand” pieces like React, Pandas, or Cuda? Developers only have a superficial understanding of the tools they are developing with.

Some developers, I usually end up fixing bugs in OSS I use

A library is deterministic.

LLMs are not.

That we let a generation of software developers rot their brains on js frameworks is finally coming back to bite us.

We can build infinite towers of abstraction on top of computers because they always give the same results.

LLMs by comparison will always give different results. I've seen it first hand when a $50,000 LLM generated (but human guided) code base just stops working an no one has any idea why or how to fix it.

Hope your business didn't depend on that.

Why would that necessarily happen? With an LLM you have perfect knowledge of the code. At any time you can understand any part of your code by simply asking the LLM to explain it. It is one of the super powers of the tools. They also accelerate debugging by allowing you to have comprehensive logging. With that logging the LLM can track down the source of problems. You should try it.

> With an LLM you have perfect knowledge of the code. At any time you can understand any part of your code by simply asking the LLM to explain it.

The LLM will give you an explanation but it may not be accurate. LLMs are less reliable at remembering what they did or why than human programmers (who are hardly 100% reliable).

Determinism is a smaller point than existence of a spec IMHO. A library has a specification one can rely on to understand what it does and how it will behave.

An LLM does not.

The thing is, it's possible to ask the LLM to add dynamic tracing, logging, metrics, a debug REPL, whatever you want to instrument your codebase with. You have to know to want that, and where it's appropriate to use. You still have to (with AI assistance) wire that all up so that it's visible, and you have to be able to interpret it.

If you didn't ask for traceability, if you didn't guide the actual creation and just glommed spaghetti on top of sauce until you got semi-functional results, that was $50k badly spent.

And if that had been done the $50k code base would be a $5,000,000 code base because the context would be 10 times as large and LLMs are quadratic.

If only we taught developers under 40 what x^2 meant instead of react.

While I agree with your sentiment, I just want to say that if your approach is to have the LLM read every file into context, or you're working in some gigantic thread (using the million token capacity most frontier models have) that's really not the best way to do it.

Not even a human would work that way... you wouldn't open 300 different python files and then try to memorize the contents of every single file before writing your first code-change.

Additionally, you're going to have worse performance on longer context sizes anyways, so you should be doing it for reasons other than cost [1].

Things that have helped me manage context sizes (working in both Python and kdb+/q):

- Keep your AGENTS.md small but useful, in it you can give rules like "every time you work on a file in the `combobulator` module, you MUST read the `combobulator/README.md`. And in those README's you point to the other files that are relevant etc. And of course you have Claude write the READMEs for you...

- Don't let logs and other output fill up your context. Tell the agent to redirect logs and then grep over them, or run your scripts with a different loglevel.

- Use tools rather than letting it go wild with `python3 -c`. These little scripts eat context like there's no tomorrow. I've seen the bots write little python scripts that send hundreds of lines of JSON into the context.

- This last tip is more subjective but I think there's value in reviewing and cleaning up the LLM-generated code once it starts looking sloppy (for example seeing lots of repetitive if-then-elses, etc.). In my opinion when you let it start building patches & duct-tape on top of sloppy original code it's like a combinatorial explosion of tokens. I guess this isn't really "vibe" coding per se.

[1] https://arxiv.org/html/2602.06319v1

Yes I agree with all of that.

The way I let my agents interact with my code bases is through a 70s BSD Unix like interface, ed, grep, ctags, etc. using Emacs as the control plane.

It is surprisingly sparing on tokens, which makes sense since those things were designed to work with a teletype.

Worth noting is that by the times you start doing refactoring the agents are basically a smarter google with long form auto complete.

All my code bases use that pattern and I'm the ultimate authority on what gets added or removed. My token spend is 10% to 1% of what the average in the team is and I'm the only one who knows what's happening under the hood.

Libraries are not deterministic. CPUs aren’t deterministic. There are margins of error among all things.

The fact that people who claim to be software developers (let alone “engineers”) say this thing as if it is a fundamental truism is one of the most maladaptive examples of motivated reasoning I have ever had the misfortune of coming across.

I would argue it couldn't be more different. I can dive into the source code of any library, inspect it. I can assess how reliable a library is and how popular. Bugs aside, libraries are deterministic. I don't see why this parallel keeps getting made over and over again.

I can dive into the source code of LLM generated code too. Indeed it is better because you have tools to document it better than a library that you use.

> Would one be uneasy about calling a library to do stuff than manually messing around with pointers and malloc()?

The irony is that the neverending stream of vulnerabilities in 3rd-party dependencies (and lately supply-chain attacks) increasingly show that we should be uneasy.

We could never quite answer the question about who is responsible for 3rd-party code that's deployed inside an application: Not the 3rd-party developer, because they have no access to the application. But not the application developer either, because not having to review the library code is the whole point.

> because not having to review the library code is the whole point.

That’s just not true at bigger companies that actually care about security rather than pretending to care about security. At my current and last employer, someone needs to review the code before using third-party code. The review is probably not enough to catch subtle bugs like those in the Underhanded C Contest, but at least a general architecture of the library is understood. Oh, and it helps that the two companies were both founded in the twentieth century. Modern startups aren’t the same.

I feel like big / old companies thrive on process and are bogged down in bureaucracy.

Sure there is a process to get a library approved, and that abstraction makes you feel better but for the guy who's job it is to approve they are not going to spend an entire day reviewing a lib. The abstraction hides what is essentially a "LGTM" its just that takes a week for someone to check it off their outlook todos.

Maybe your experience is different.

I think it's not too different in that specific sense, but it's more than that. To bring libraries on equal footing, imagine they were cloud only, had usage limits.

I'm also somewhat addicted to this stuff, and so for me it's high priority to evaluate open models I can run on my own hardware.

I hate this comparison because you're comparing a well defined deterministic interface with LLM output, which is the exact opposite.

A library doesn't randomly drop out of existence cause of "high load" or whatever and limit you to a some number of function calls per day. With local models there's no issue, but this API shit is cancer personified, when you combine all the frontend bugs with the flaky backend, rate limits, and random bans it's almost a literal lootbox where you might get a reply back or you might get told to fuck off.

Qwen has become a useful fallback but it's still not quite enough.