Depends on what you do. When I'm using LLMs to generate code for projects I need to maintain (basically, everything non-throw-away-once-used), I treat it as any other code I'd write, tightly controlled with a focus on simplicity and well-thought out abstractions, and automated testing that verify what needs to be working. Nothing gets "merged" into the code without extensive review, and me understanding the full scope of the change.

So with that, I can change the code by hand afterwards or continue with LLMs, it makes no difference, because it's essentially the same process as if I had someone follow the ideas I describe, and then later they come back with a PR. I think probably this comes naturally to senior programmers and those who had a taste of management and similar positions, but if you haven't reviewed other's code before, I'm not sure how well this process can actually work.

At least for me, I manage to produce code I can maintain, and seemingly others to, and they don't devolve into hairballs/spaghetti. But again, requires reviewing absolutely every line and constantly edit/improve.

We recently got a PR from somebody adding a new feature and the person said he doesn't know $LANG but used AI.

The problem is, that code would require a massive amount of cleanup. I took a brief look and some code was in the wrong place. There were coding style issues, etc.

In my experience, the easy part is getting something that works for 99%. The hard part is getting the architecture right, all of the interfaces and making sure there are no corner cases that get the wrong results.

I'm sure AI can easily get to the 99%, but does it help with the rest?

> I'm sure AI can easily get to the 99%, but does it help with the rest?

Yes the AI can help with 100% is it. But the operator of the AI needs to be able to articulate this to the AI .

I've been in this position, where I had no choice but to use AI to write code to fix bugs in another party's codebase, then PR the changes back to the codebase owners. In this case it was vendor software that we rely on which the vendor hadn't fixed critical bugs in yet. And exactly as you described, my PR ultimately got rejected because even though it fixed the bugs in the immediate sense, it presented other issues due to not integrating with the external frameworks the vendor used for their dev processes. At which point it was just easier for the vendor to fix the software their way instead of accept my PR. But the point is that I could have made the PR correct in the first place, if I as the AI operator had the knowledge needed to articulate these more detailed and nuanced requirements to the AI. Since I didn't have this information then the AI generated code that worked but didn't meet the vendors spec. This type of situation is incredibly easy to fall into and is a good example of why you still need a human at the wheel on projects to set the guidance but you don't necessarily need the human to be writing every line of code.

I don't like the situation much but this is the reality of it. We're basically just code reviewers for AI now

Yeah, so what I'm mostly doing, and advocate for others to do, is basically the pure opposite of that.

Focus on architecture, interfaces, corner-cases, edge-cases and tradeoffs first, and then the details within that won't matter so much anymore. The design/architecture is the hard part, so focus on that first and foremost, and review + throw away bad ideas mercilessly.

Yes it does... but only in the hands of an expert who knows what they are doing.

I'd treat PRs like that as proof of concepts that the thing that can be done, but I'd be surprised if they often produced code that should be directly landed.

In the hands of an expert… right. So is it not incredibly irresponsible to release these tools into the wild, and expose it those who are not experts? They will actually become incredibly worse off. Ironically this does not ‘democratise’ intelligence at all - the gap widens between experts and the rest.

I sometimes wonder what would have happened if OpenAI had built GPT3 and then GPT-4 and NOT released them to the world, on the basis that they were too dangerous for regular people to use.

That nearly happened - it's why OpenAI didn't release open weight models past GPT2, and it's why Google didn't release anything useful built on Transformers despite having invented the architecture.

If we lived in the world today, LLMs would be available only to a small, elite and impossibly well funded class of people. Google and OpenAI would solely get to decide who could explore this new world with them.

I think that would suck.

So… what?

With all due respect I don’t care about an acceleration in writing code - I’m more interested in incremental positive economic impact. To date I haven’t seen anything convince me that this technology will yield this.

Producing more code doesn’t overcome the lack of imagination, creativity and so on to figure out what projects resources should be invested in. This has always been an issue that will compound at firms like Google who have an expansive graveyard of projects laid to rest.

In fact, in a perverse way, all this ‘intelligence’ can exist. At the same time humans can get worse in their ability to make judgments in investment decisions.

So broadly where is the net benefit here?

You mean the net benefit in widespread access to LLMs?

I get the impression there's no answer here that would satisfy you, but personally I'm excited about regular people being able to automate tedious things in their lives without having to spend 6+ months learning to program first.

And being able to enrich their lives with access to as much world knowledge as possible via a system that can translate that knowledge into whatever language and terminology makes the most sense to them.

“I'm excited about regular people being able to automate tedious things in their lives without having to spend 6+ months learning to program first.”

Bring the implicit and explicit costs to date into your analysis and you should quickly realise none of this makes sense from a societal standpoint.

Also you seem to be living in a bubble - the average person doesn’t care about automating anything!

The average person already automates a lot of things in their day to day lives. They spend far less time doing the dishes, laundry, and cleaning because parts of those tasks have been mechanized and automated. I think LLMs probably automate the wrong thing for the average person (i.e., I still have to load the laundry machine and fold the laundry after) but automation has saved the average person a lot of time

For example, my friend doesn’t know programming but his job involves some tedious spreadsheet operations. He was able to use an LLM to generate a Python script to automate part of this work. Saving about 30 min/day. He didn’t review the code at all, but he did review the output to the spreadsheet and that’s all that matters.

His workplace has no one with programming skills, this is automation that would never have happened. Of course it’s not exactly replacing a human or anything. I suppose he could have hired someone to write the script but he never really thought to do that.

What sorts of things will the average, non-technical person think of automating on a computer that are actually quality-of-life-improving?

A work colleague had a tedious operation involving manually joining a bunch of video segments together in a predictable pattern. Took them a full working day.

They used "just" ChatGPT on the web to write an automation. Now the same process takes ~5 minutes of work. Select the correct video segments, click one button to run script.

The actual processing still takes time, but they don't need to stand there watching it progress so they can start the second job.

And this was a 100% non-tecnical marketing person with no programming skills past Excel formulas.

My favorite anecdotal story here is that a couple of years ago I was attending a training session at a fire station and the fire chief happened to mention that he had spent the past two days manually migrating contact details from one CRM to another.

I do not want the chief of a fire station losing two days of work to something that could be scripted!

I don't want my doctor to vibe script some conversion only to realize weeks or months later it made a subtle error in my prescription. I want both of them to have enough fund to hire someone to do it properly. But wanting is not enough unfortunately...

Humans make subtle errors all the time too though. AI results still need to be checked over for anything important, but it's on a vector toward being much more reliable than a human for any kind of repetitive task.

Currently, if you ask an LLM to do something small and self-contained like solve leetcode problems or implement specific algorithms, they will have a much lower rate of mistakes, in terms of implementing the actual code, than an experienced human engineer. The things it does badly are more about architecture, organization, style, and taste.

But with a software bug, the error becomes rapidly widespread and systematic, whereas human error are often not. Doing wrong with a couple of prescription because the doc worked for 12+ hrs is different from systematically doing wrong on a significant number of prescriptions until someone double check the results.

An error in a massive hand-crafted Excel sheet also becoms systematic and wide-spread.

Because Excel has no way of doing unit tests or any kind of significant validation. Big BIG things have gone to shit because of Excel.

Things that would have never happened if the same thing was a vibe-coded python script and a CSV.

I agree with the excel thing. Not with thinking it can't happen with vibecoded python.

I think handling sensitive data should be done by professional. A lawyer handles contracts, a doctor handles health issue and a programmer handles data manipulation through programs. This doesn't remove risk of errors completely, but it reduces it significantly.

In my home, it's me who's impacted if I screw up a fix in my plumbing, but I won't try to do it at work or in my child's school.

I don't care if my doctor vibe codes an app to manipulate their holidays pictures, I care if they do it to manipulate my health or personal data.

Of course issues CAN happen with Python, but at least with Python we have tools to check for the issues.

Bunch of your personal data is most likely going through some Excel made by a now-retired office worker somewhere 15 years ago. Nobody understands how the sheet works, but it works so they keep using it :) A replacement system (a massive SaaS application) has been "coming soon" for 8 years and cost millions, but it still doesn't work as well as the Excel sheet.

> Also you seem to be living in a bubble - the average person doesn’t care about automating anything!

One of my life goals is to help bring as many people into my "technology can automate things for you" bubble as I possibly can.

You can apply the same logic to all technologies, including programming languages, HTTP, cryptography, cameras, etc. Who should decide what's a responsible use?

I'm curious about the economic aspects of this. If only experts can use such tools effectively, how big will the total market be and does that warrant the investments?

For companies, if these tools make experts even more special, then experts may get more power certainly when it comes to salary.

So the productively benefits of AI have to be pretty high to overcome this. Does AI make an expert twice as productive?

I have been thinking about this in the last few weeks. First time I see someone commenting about it here.

- If the number of programmers will be drastically reduced, how big of a price increase companies like Anthropic would need to be profitable?

- If you are a manager, you now have a much higher bus factor to deal with. One person leaving means a greater blow on the team's knowledge.

- If the number of programmers will be drastically reduced, the need for managers and middle managers will also decline, no? Hmm...

Coding style can be deterministically checked for, and should be checked, automatically during linting. And no PR should get a single human pair of eyes, except for the author, looking at it until all CI checks have passed.

Many many other stylistic choices and code complexity can be automatically checked, why aren't you doing it?

> We recently got a PR from somebody adding a new feature and the person said he doesn't know $LANG but used AI.

"Oh, and check it out: I'm a bloody genius now! Estás usando este software de traducción in forma incorrecta. Por favor, consulta el manual. I don't even know what I just said, but I can find out!"

I think we will find out that certain languages, frameworks and libraries are easier for AI to get all the way correct. We may even have to design new languages, frameworks and libraries to realize the full promise of AI. But as the ecosystem around AI evolves I think these issues will be solved.

... And with this level of quality control, is it still faster than writing it yourself?