I think there's an interesting dichotomy. I find that for things I'm already capable at, LLMs are relatively inconsequential. But for things I'm no good at, it's a huge game changer. For a large company, that's going to be able to hire out most needed roles for any given project, this means the overall effect is going to be relatively inconsequential. At best, they may be able to cut down on labor costs by having one guy do a mediocre job at 5 people's jobs in exchange for a worse product. Short-term gains for long-term costs, wcgw?

But for a small studio, or independent developer, LLMs are a big game changer. Being able to do a mediocre job at 5 people's jobs is a huge leap over trying to get by without those jobs - relying on third party assets or other sorts of content, or even worse - doing a really awful job of trying to improv those jobs. See the UI of basically any program ever that was clearly laid out by a programmer and not a designer. Or there's the whole trying to rip off stuff from dribbble, but lacking the skills to do so. Whereas with AI, you can suddenly competently rip off everything and everybody - it's basically their entire MO.

> I find that for things I'm already capable at, LLMs are relatively inconsequential. But for things I'm no good at, it's a huge game changer.

What are the chances that this is the Gell-Mann amnesia effect? Sounds like the textbook definition of it.

Personally, I find the exact opposite to be true. LLMs only help me when I already know exactly what I'm doing.

I can give an anecdote. I'm a backend engineer for a service that I would consider pretty high horsepower. We get about 30k sign ups and trillions of events a day. I haven't touched the front end with a 10 foot pole since college.

I got the opportunity to rewrite our aging login page just as a fun experiment. I sat down with one of our analysts and we just went to town in a zoom trying out stuff with claude until we made something pretty sweet. Ran it through all our systems for accessibility, performance, etc and it came out clean. Made a PR and fired up a test that day in production. I haven't written a lick of our front end framework ever in my entire life and we were able to build something that has had a marked improvement in our user engagement in a day.

> a marked improvement in our user engagement in a day.

Do you have any idea what has caused this engagement improvement and indeed do you actually have any metrics or is it hearsay?

It is much easier to knock something up in a day as you have done, but often the reason manual things take longer is they are based on actual testing and research which takes longer than a day however you do it. The manual way gives you much more data on the hows and whys, and will inform you much more in the future when you need to change again instead of just 'ai did it last time, lets use it again!'

No, we did a actual test using our existing testing framework. We have shitloads of metrics to know when a user gets stuck, when they give up, which login path they took, etc.

This wasn't a half assed test but a legitimate effort to improve something that we never prioritized

We had a legitimate 25% reduction in users giving up logging in in a system that has millions of users.

We ran a 50-50 AB test for several weeks to confirm the data and then turned it on completely

edit: If you haven't already read my post, I'd also like to say that the benefit AI gives us is that I worked on something I never get to work on, the analyst got to try a hunch he always had, and we got to see it go live in a day. If it didn't' work out, we were out a day of work which beats the few weeks of an effort prior to AI that we would spend on something just to find out it didn't work.

This seems consistent with OP. You had a feature where most of his Gantt chart is, in effect, already done: you had a clear problem with a clear well thought out design/solution (with associated documentation) in mind, you had a well setup analytics process for deployment and followup... you really had everything except that big fat chunk in the middle labeled 'coding'. So in your anecdote, an agentic coding LLM really could deliver a huge speedup by doing the remaining 10% or whatever of the work.

This is why LLMs are really great 'knocking off the todo/wishlist' of things you always meant to do. The problem, as far as broader discussions of 'productivity multipliers' or 'total factor productivity' go is that there's a certain perverse diminishing returns to such wishlist items (if each item was all that important, why didn't it get done before?), they generally only apply to a small part of a large complicated whole (what % of your ecosystem/business/community as a whole is the login page, as pleasing and profitable as that fix is relative to the investment? Probably not a big %), and they are also finite (what happens when you have worked through your backlog of lowhanging fruit?).

I ask myself these same questions every workday. Are you cooking any new articles on this topic, Gwern? Reading your (thoroughly researched) thoughts often helps me clarify my own.

Just because one isn’t good at a thing doesn’t preclude one from being a sufficiently passable judge of a thing.

To wit, the answer pre-AI was to hire an expert on that thing, and you would then critically assess their work product, despite being unable to build it yourself.

True, but if you hire a generalist and they are consistently under-performing specifically in the subject matter where you are an expert, it may behoove you to take the rest of their work with a grain of salt as well.

The key is to understand what someone is actually good at, rather than lump them into some amorphous "generalist" category. Along with (presumptively) broad experience, a generalist is just a specialist at various things which often feel obtuse or reductive to delineate — e.g. "I'm a specialist at rapidly narrowing vague failures into specific causes, assessing scalability trade-offs, understanding edge-cases at the intersection of two programming languages, and optimising cache invalidation."

Perhaps the best generalist skill when working in teams of specilists is "a reasonably accurate bullshit detector."