In my company I feel that we getting totally overrun with code that's 90% good, 10% broken and almost exactly what was needed.

We are producing more code, but quality is definitely taking a hit now that no-one is able to keep up.

So instead of slowly inching towards the result we are getting 90% there in no time, and then spending lots and lots of time on getting to know the code and fixing and fine-tuning everything.

Maybe we ARE faster than before, but it wouldn't surprise me if the two approaches are closer than what one might think.

What bothers me the most is that I much prefer to build stuff rather than fixing code I'm not intimately familiar with.

LLMs are amazing at producing boilerplate, which removes the incentive to get rid of it.

Boilerplate sucks to review. You just see a big mass of code and can't fully make sense of it when reviewing. Also, Github sucks for reviewing PRs with too many lines.

So junior/mid devs are just churning boilerplate-rich code and don't really learn.

The only outcome here is code quality is gonna go down very very fast.

I envy the people working at mystical places where humans were on average writing code of high quality prior LLMs. I'll never know you now.

I am working at one right now and I have worked at such in the past. One of the main tricks is to treat code reviews very seriously so people are not incentived to write lazy code. You need to build a cultire which cares about quality of both product and code. You also need decent developers, but not necessarily great developers.

It's very easy to go from what you're describing to a place hamstrung by nitpicking, though. The code review becomes more important than the code itself and appearances start mattering more than results.

Oh, I understand what you need to do. It's like losing weight. It's fairly simple.

And at the same time it's borderline impossible proven by the fact that people can't do it, even though everyone understands and roughly everyone agrees on how it works.

So the actual "trick" turns out to be understanding what keeps people from doing the necessary things that they all agree on are important – like treating code reviews very seriously. And getting that part right turns out to be fairly hard.

Did you make an effort to find those places and get them to hire you?

Some of them will get hired to fix the oceans of boilerplate code.

> In my company I feel that we getting totally overrun with code that's 90% good, 10% broken and almost exactly what was needed.

This is painfully similar to what happens when a team grows from 3 developers to 10 developers. All of sudden, there's a vast pile of coding being written, you've never seen 75% of it, your architectural coherence is down, and you're relying a lot more on policy and CI.

Where LLM's differ is that you can't meaningfully mentor them, and you can't let them go after the 50th time they try turn off the type checker, or delete the unit tests to hide bugs.

Probably, the most effective way to use LLMs is to make the person driving the LLM 100% responsible for the consequences. Which would mean actually knowing the code that gets generated. But that's going to be complicated to ensure.

Have thorough code reviews and hold the developer using the LLM responsible for everything in the PR before it can be merged.

Perlis, epigram 7:

7. It is easier to write an incorrect program than understand a correct one.

Link: http://cs.yale.edu/homes/perlis-alan/quotes.html

"but quality is definitely taking a hit now that no-one is able to keep up."

And its going to get worse! So please explain to me how in the net, you are going to be better off? You're not.

I think most people haven't taken a decent economics class and don't deeply understand the notion of trade offs and the fact there is no free lunch.

Technology has always helped people. Are you one of the people that say optimizing compilers are bad? Do you not use the intellisense? Or IDEs? Do you not use higher level languages? Why not write in assembly all the time? No free lunch right.

Yes there are trade offs, but at this point if you haven’t found a way to significantly amplify and scale yourself using llms, and your plan is to instead pretend that they are somehow not useful, that uphill battle can only last so long. The genie is out of the bag. Adapt to the times or you will be left behind. That’s just what I think.

Technology does not always help people, in fact often it creates new problems that didn't exist before.

Also telling someone to "adapt to the times" is a bit silly. If it helped as much as its claimed, there wouldn't be any need to try and convince people they should be using it.

A LOT of parallels with crypto, which is still trying to find its killer app 16 years later.

I don’t think anyone needs to be convinced at this point. Every developer is using LLM and I really can’t believe someone who has made a career out of automating things wouldn’t be immediately drawn to trying them at least. Every single company seems convinced and using it too. The comparison to crypto makes no sense.

> Every developer is using LLM

Citation needed. In my circles, Senior engineer are not using them a lot, or in very specific use cases. My company is blocking LLMs use apart from a few pilots (which I am part of, and while claude code is cool, its effectiveness on a 10-year old distributed codebase is pretty low).

You can't make sweeping statements like this, software engineering is a large field.

And I use claude code for my personal projects, I think it's really cool. But the code quality is still not there.

Stack overflow published recently a survey in which something like 80% of developers were using AI and the rest “wants to soon”. By now I have trouble believing a competent developer is still convinced they shouldn’t use it at all , though a few ludites perhaps might hold on for a bit longer.

Stack overflow published a report about text editors and Emacs wasn’t part of the list. So I’m very sceptical about SO surveys.

I was also offended by that :D.

“Using AI” is a very broad term. Using AI to generate lurem ipsum is still “using AI”.

> You can't make sweeping statements like this, software engineering is a large field.

that goes both ways

I need to be convinced.

Go ahead, convince me. Please describe clearly and concisely in one or two sentences the clear economic value/advantage of LLMs.

Careful now, you will scare them away!!!!!

People love stuff that makes them feel like they are doing less work. Cognitive biases distort reality and rational thinking, we know this already through behavioural economics.

The company I work for uses LLM's for digital marketing, the company has over 100M ARR selling products build on top of LLM's with real life measurable impact as measured by KPIs.

> real life measurable impact as measured by KPIs

This is making me even more skeptical of your claims. Individual metrics are often very poor at tracking reality.

Individual metrics are often very good at distorting reality, which is why corporate executives love them so much.

Digital marketing is old. What about LLMs gives an advantage to digital marketing?

Review responses for example. Responding to reviews has shown to have positive impact on brands. Traditionally it’s been hard to respond to all the reviews for high volume locations. Not anymore.

That’s one example, there are dozens of processes that are now relatively easy to automate due to LLMs.

It’s just plain mean to make the Emperor speak of the thread count of his “clothes”.

My parents could have said your first paragraph when I tried to teach them they could Google their questions and find answers.

Technology moves forward and productivity improves for those that move with it.

A few examples of technology that moved 'forward' but decreased productivity for those who moved with it from my 'lived' experience:

1) CASE tools (and UML driven development)

2) Wizard driven code.

3) Distributed objects

4) Microservices

These all really were the hot thing with massive pressure to adopt them just like now. The Microsoft demos of Access wizards generating a complete solution for your business had that same wow feeling as LLM code. That's not to say that LLM code won't succeed but it is to say that this statement is definitely false:

> Technology moves forward and productivity improves for those that move with it.

> Technology moves forward and productivity improves for those that move with it.

It does not, technology regresses just as often and linear deterministic progress is just a myth to begin with. There is no guarantee for technology to move forward and always make things better.

There are plenty of examples to be made where technology has made certain things worse.

I would say it as "technology tends to concentrate power to those who wield it."

That's not all it does but I think it's one of the more important fundamentals.

Why is productivity so important? When do regular people get to benefit from all this "progress?"

Being permitted to eat - is that not great benefit?

"But with Google is easier!" When you were trying to teach your folks about Google, were you taking into consideration dependence, enshittification, or the surveillance economy? No, you were retelling them the marketing.

Just by having lived longer, they might've had the chance to develop some intuition about the true cost of disruption, and about how whatever Google's doing is not a free lunch. Of course, neither them, nor you (nor I for that matter) had been taught the conceptual tools to analyze some workings of some Ivy League whiz kinds that have been assigned to be "eating the world" this generation.

Instead we've been incentivized to teach ourselves how to be motivated by post-hoc rationalizations. And ones we have to produce at our own expense too. Yummy.

Didn't Saint Google end up enshittifying people's very idea of how much "all of the world's knowledge" is; gatekeeping it in terms of breadth, depth and availability to however much of it makes AdSense. Which is already a whole lot of new useful stuff at your fingertips, sure. But when they said "organizing all of the world's knowledge" were they making any claims to the representativeness of the selection? No, they made the sure bet that it's not something the user would measure.

In fact, with this overwhelming amount of convincing non-experientially-backed knowledge being made available to everyone - not to mention the whole mass surveillance thing lol (smile, their AI will remember you forever) - what happens first and foremost is the individual becomes eminently marketable-to, way more deeply than over Teletext. Thinking they're able to independently make sense of all the available information, but instead falling prey to the most appealing narrative, not unlike a day trader getting a haircut on market day. And then one has to deal with even more people whose life is something someone sold to them, a race to the bottom in the commoditized activity (in the case of AI: language-based meaning-making).

But you didn't warn your parents about any of that or sit down and have a conversation about where it means things are headed. (For that matter, neither did they, even though presumably they've had their lives altered by the technological revolutions of their own day.) Instead, here you find yourself stepping in for that conversation to not happen among the public, either! "B-but it's obvious! G-get with it or get left behind!" So kind of you to advise me. Thankfully it's just what someone's paid for you to think. And that someone probably felt very productive paying big money for making people think the correct things, too, but opinions don't actually produce things do they? Even the ones that don't cost money to hold.

So if it's not about the productivity but about the obtaining of money to live, why not go extract that value from where it is, instead of breathing its informational exhaust? Oh, just because, figuratively speaking, it's always the banks have AIs that don't balk at "how to rob the bank"; and it's always we that don't. Figures, no? But they don't let you in the vault for being part of the firewall.

Paul Krugman (Nobel laureate in economy) said in 1998 that the internet is no biggie. Many companies needed convincing to adopt the internet (heck, some still need convincing).

Would you say the same thing ("If it helped as much as its claimed, there wouldn't be any need to try and convince people they should be using it.") about the internet?

I would, unironically.

Thing's called a self-fulfilling prophecy. Next level to a MLM scheme: total bootstrap. Throwing shit at things in innate primate activity, use money to shift people's attention to a given thing for long enough and eventually they'll throw enough shit at the wall for something to stick. At which point it becomes something able to roll along with the market cycles.

"It is difficult to get a man to understand something when his salary depends upon his not understanding it"

It's also difficult to get a man to understand something if you stubbornly refuse to explain it.

Instead of this crypto-esque hand waving, maybe you can answer me now?

> Do you not use the intellisense?

Not your point, but I turned intellisense off years ago and haven't missed it. There's so much going on with IDE UIs now that having extra drop downs while typing was just too much. And copilot largely replaces the benefit of it anyway.

The big difference is that all of the other technologies you cite are deterministic making it easy to predict their behavior.

You have to inspect the output of LLMs much more carefully to make sure they are doing what you want.

"Technology has always helped people. Are you one of the people that say optimizing compilers are bad? Do you not use the intellisense? Or IDEs? Do you not use higher level languages? Why not write in assembly all the time? No free lunch right."

Actually I am not a software engineer for a living so I have zero vested interest or bias lmao. That said, I studied Comp Sci at a top institution and really loved defining algorithms and had no interest in actual coding as it didn't give my brain the variety it needs.

If you are employed as a software engineer, I am probably more open to realizing the problems that only become obvious to you later on.

Someone already pointed out that we're at the point when it's longer possible to know if comments like the above are satire or not.

Yep, my strong feeling is that the net benefit of all of this will be zero. The time you have to spend holding the LLM hand is almost equal to how much time you would have spent writing it yourself. But then you've got yourself a codebase that you didn't write yourself, and we all know hunting bugs in someone else's code is way harder than code you had a part in designing/writing.

People are honestly just drunk on this thing at this point. The sunken cost fallacy has people pushing on (ie. spending more time) when LLMs aren't getting it right. People are happy to trade convenience for everything else, just look at junk food where people trade in flavour and their health. And ultimately we are in a time when nobody is building for the future, it's all get rich quick schemes: squeeze then get out before anyone asks why the river ran dry. LLMs are like the perfect drug for our current society.

Just look at how technology has helped us in the past decades. Instead of launching us towards some kind of Star Trek utopia, most people now just work more for less!

Only when purely vibe coding. AI currently saves a LOT of time if you get it to generate boilerplate, diagnose bugs, or assist with sandboxed issues.

The proof is in the pudding. The work I do takes me half as long as it used to and is just as high in quality, even though I manage and carefully curate the output.

I don't write much boilerplate anyway. I long ago figured out ways to not do that (I use a computer to do repetitive tasks for me). So when people talk about boilerplate I feel like they're only just catching up to me, not surpassing me.

As for catching bugs, maybe, but I feel like it's pot luck. Sometimes it can find something, sometimes it's just complete rubbish. Sometimes worth giving it a spin but still not convinced it's saving that much. Then again I don't spend much time hunting down bugs in unfamiliar code bases.

Like any tool, it has use cases where it excels and use cases where it’s pretty useless.

Unfamiliar code bases is a great example, if it’s able to find the bug it could do so almost instantly, as opposed to a human trying to read through the code base for ages. But for someone who is intimately familiar with a code base, they’ll probably solve the problem way faster, especially if it’s subtle.

Also say if your job is taking image designs and building them in html/css, just feeding it an image getting it to dump you an html/css framework and then you just clean up the details of will save you a lot of time. But on the flip side if you need to make critically safe software where every line matters, you’ll be way faster on your own.

People want to give a black and white “ai is bad” or “ai is great”, but the truth _as always_ is “it depends”. Humans aren’t very good at “it depends”.

I use AI for most of those things. And I think it probably saves me a bit of time.

But in that study that came out a few weeks ago where they actually looked at time saved, every single developer overestimated their time saved. To the point where even the ones who lost time thought they saved time.

LLMs are very good at making you feel like you’re saving time even when you aren’t. That doesn’t mean they can’t be a net productivity benefit.

But I’d be very very very surprised if you have real hard data to back up your feelings about your work taking you half as long and being equal quality.

That study predates Claude Code though.

I’m not surprised by the contents. I had the same feeling; I made some attempts at using LLMs for coding prior to CC, and with rare exceptions it never saved me any time.

CC changed that situation hugely, at least in my subjective view. It’s of course possible that it’s not as good as I feel it is, but I would at least want a new study.

I don’t believe that CC is so much better than cursor using Claude models that it moves the needle enough to flip the results of that study.

The key thing to look at is that even the participants that did objectively save time, overestimated time saved by a huge amount.

But also you’re always likely to be at least one model ahead of any studies that come out.

> That study predates Claude Code though.

Is there a study demonstrating Claude Code improves productivity?

I mean, I used to average 2 hours of intense work a day and now it’s 1 hour.

How are you tracking that? Are you keeping a log, or are you just guessing? Do you have a mostly objective definition of intense work or are you just basing it on how you feel? Is your situation at work otherwise exactly the same, or have you gotten into a better groove with your manager? Are you working on exactly the same thing? Have you leveled up with some more experience? Have you learned the domain better?

Is your work objectively the same quality? Is it possible that you are producing less but it’s still far above the minimum so no one has noticed? Is your work good enough for now, but a year from now when someone tries to change it, it will be a lot harder for them?

Based on the only real studies we have, humans grossly overestimate AI time savings. It’s highly likely you are too.

_sigh_. Really dude? Just because people overestimate them on average doesn’t mean every person does. In fact, you should be well versed enough about the statistics to understand that it will be a spectrum that is highly dependent on both a persons role and how they use it.

For any given new tool, a range of usefulness that depends on many factors will affect people differently as individuals. Just because a carpenter doesn’t save much time because Microsoft excel exists doesn’t mean it’s not a hugely useful tool, and doesn’t mean it doesn’t save a lot of time for accountants, for example.

Instead of trying to tear apart my particular case, why not entertain the possibility that it’s more likely I’m reporting pretty accurately but it’s just I may be higher up that spectrum - with a good combo of having a perfect use case for the tool and also using the tool skilfully?

> _sigh_. Really dude? Just because people overestimate them on average doesn’t mean every person does.

In the study, every single person overestimated time saved on nearly every single task they measured.

Some people saved time, some didn’t. Some saved more time, some less. But every single person overestimated time saved by a large margin.

I’m not saying you aren’t saving time, but it’s very unlikely that if you aren’t tracking things very carefully that you are overestimating.

I’ll admit it’s possible my estimates are off a bit. What isn’t up for debate though is that it’s made a huge difference in my life and saved me a ton of time.

The fact that people overestimate its usefulness is somewhat of a “shrug” for me. So long as it _is_ making big differences, that’s still great whether people overestimate it or not.

If people overestimate time saved by huge margins, we don’t know whether it’s making big differences or not. Or more specifically whether the boost is worth the cost (both monetary and otherwise).

Only if we’re only using people’s opinions as data. There are other ways to do this.

My way of looking at this is simple.

What are people doing this quote on quote, time that they have gained back? Working on new projects? Ok can you show me the impact on the financials (show me the money)? And then I usually get dead silence. And before someone mentions the layoffs - lmao get real. Its offshoring 2.0 so that the large firms can increase their internal equity to keep funding this effort.

Most people are terrible at giving true informed opinions - they never dig deep enough to determine if what they are saying is proper.

Fast feedback is one benefit, given the 90% is releasable - even if only to a segment of users. This might be anathema to good engineering, but a benefit to user experience research and to organizations that want to test their market for demand.

Fast feedback is also great for improving release processes; when you have a feedback loop with Product, UX, Engineering, Security etc, being able to front load some % of a deliverable can help you make better decisions that may end up being a time saver net/net.

> And its going to get worse!

That isn't clear given the fact that LLMs and, more importantly, LLM programming environments that manage context better are still improving.

> What bothers me the most is that I much prefer to build stuff rather than fixing code I'm not intimately familiar with.

Me too. But I think there's a split here. Some people love the new fast and loose way and rave about how they're experiencing more joy coding than ever before.

But I tried it briefly on a side project, and hated the feeling of disconnect. I started over, doing everything manually but boosted by AI and it's deeply satisfying. There is just one section of AI written code that I don't entirely understand, a complex SQL query I was having trouble writing myself. But at least with an SQL query it's very easy to verify the code does exactly what you want with no possibility of side effects.

I'd argue that this awareness is a good thing; it means you're measuring, analyzing, etc all the code.

Best practices in software development for forever have been to verify everything; CI, code reviews, unit tests, linters, etc. I'd argue that with LLM generated code, a software developer's job and/or that of an organization as a whole has shifted even more towards reviewing and verification.

If quality is taking a hit you need to stop; how important is quality to you? How do you define quality in your organization? And what steps do you take to ensure and improve quality before merging LLM generated code? Remember that you're still the boss and there is no excuse for merging substandard code.

Imagine someones add 10 UTs carefully devised and someone notices they need 1 more during the PR.

Scenario B, you add 40 with an LLM, that look good on paper but only cover 6 of the original ones. Besides, who's going to pay careful attention to a PR with 40.

"Must be so thorough!".

As Fowler himself states, there's a need to learn to use these tools properly.

In any case poor work quality is a failure of tech leadership and culture, it's not AI's fault.

It’s funny how nothing seems to be AI’s fault.

That's because it's software / an application. I don't blame my editor for broken code either. You can't put blame on software itself, it just does what it's programmed to do.

But also, blameless culture is IMO important in software development. If a bug ends up in production, whose fault is it? The developer that wrote the code? The LLM that generated it? The reviewer that approved it? The product owner that decided a feature should be built? The tester that missed the bug? The engineering organization that has a gap in their CI?

As with the Therac-25 incident, it's never one cause: https://news.ycombinator.com/item?id=45036294

Blameless culture is important for a lot of reasons, but many of them are human. LLMs are just tools. If one of the issues identified in a post-mortem is "using this particular tool is causing us problems", there's not a blameless culture out there that would say "We can't blame the tool..."; the action item is "Figure out how to improve/replace/remove the tool so it no longer contributes to problems."

Blame is purely social and purely human. “Blaming” a tool or process and root causing are functionally identical. Misattributing an outage to a single failure is certainly one way to fail to fix a process. Failing to identify faulty tools/ faulty applications is another way.

I was being flippant to say it’s never AI’s fault, but due to board/C-Suite pressure it’s harder than ever to point out the ways that AI makes processes more complex, harder to reason about, stochastic, and expensive. So we end up with problems that have to be attributed to something not AI.

> You can't put blame on software itself, it just does what it's programmed to do.

This isn't what AI enthusiasts say about AI though, they only bring that up when they get defensive but then go around and say it will totally replace software engineers and is not just a tool.

If poor work gets merged, the responsibility lies in who wrote it, who merged it, and who allows such a culture.

The tools used do not hold responsibilities, they are tools.

"I got rid of that machine saw. Every so often it made a cut that was slightly off line but it was hard to see. I might not find out until much later and then have to redo everything."

How could a tool be at fault? If an airplane crashes is the plane at fault or the designers, engineers, and/or pilot?

Designers, engineers, and/or pilots aren't tools, so that's a strange rhetorical question.

At any rate, it depends on the crash. The NTSB will investigate and release findings that very well may assign fault to the design of the plane and/or pilot or even tools the pilot was using, and will make recommendations about how to avoid a similar crash in the future, which could include discontinuing the use of certain tools.

If your toaster burns your breakfast bread, Do you ultimately blame "it"?

You gdt mad, swear at it, maybe even throw it to the wall on a git of rage but, at the end of the day, deep inside you still know you screwed.

Devices can be faulty and technology can be inappropriate.

If I bought an AI powered toaster that allows me to select a desired shade of toast, I select light golden brown, and it burns my toast, I certainly do blame “it”.

I wouldn’t throw it against a wall because I’m not a psychopath, but I would demand my money back.

No one seems to be able to grasp the possibility that AI is a failure

> No one seems to be able to grasp the possibility that AI is a failure.

Do you think by the time GPT-9 comes, we'll say "That's it, AI is a failure, we'll just stop using it!"

Or do you speak in metaphorical/bigger picture/"butlerian jihad" terms?

I don't see the use-case now, maybe there will be one by GPT-9

Absence of your need isn't evidence of no need.

This is true, but I've never heard of a use case. To which you might reply, "doesn't mean there isn't one," which you would be also right about.

Maybe you know one.

I presume your definition of use case is something that doesn't include what people normally use it for. And I presume me using it for coding every day is disqualified as well.

I didn't mean to suggest it has no utility at all. That's obviously wrong (same for crypto). I meant a use case in line with the projections the companies have claimed (multiple trillions). Help with basic coding (of which efficiency gains are still speculative) is not a multi-trillion dollar business.

You've failed to figure out when and how to use it. It's not a binary failed/succeeded thing.

None of the copyright issues or suicide cases are handled in the court yet. There are many aspects.

Metaverse was...

“There’s no use for this thing!” - said the farmer about the computer