> a marked improvement in our user engagement in a day.

Do you have any idea what has caused this engagement improvement and indeed do you actually have any metrics or is it hearsay?

It is much easier to knock something up in a day as you have done, but often the reason manual things take longer is they are based on actual testing and research which takes longer than a day however you do it. The manual way gives you much more data on the hows and whys, and will inform you much more in the future when you need to change again instead of just 'ai did it last time, lets use it again!'

No, we did a actual test using our existing testing framework. We have shitloads of metrics to know when a user gets stuck, when they give up, which login path they took, etc.

This wasn't a half assed test but a legitimate effort to improve something that we never prioritized

We had a legitimate 25% reduction in users giving up logging in in a system that has millions of users.

We ran a 50-50 AB test for several weeks to confirm the data and then turned it on completely

edit: If you haven't already read my post, I'd also like to say that the benefit AI gives us is that I worked on something I never get to work on, the analyst got to try a hunch he always had, and we got to see it go live in a day. If it didn't' work out, we were out a day of work which beats the few weeks of an effort prior to AI that we would spend on something just to find out it didn't work.

This seems consistent with OP. You had a feature where most of his Gantt chart is, in effect, already done: you had a clear problem with a clear well thought out design/solution (with associated documentation) in mind, you had a well setup analytics process for deployment and followup... you really had everything except that big fat chunk in the middle labeled 'coding'. So in your anecdote, an agentic coding LLM really could deliver a huge speedup by doing the remaining 10% or whatever of the work.

This is why LLMs are really great 'knocking off the todo/wishlist' of things you always meant to do. The problem, as far as broader discussions of 'productivity multipliers' or 'total factor productivity' go is that there's a certain perverse diminishing returns to such wishlist items (if each item was all that important, why didn't it get done before?), they generally only apply to a small part of a large complicated whole (what % of your ecosystem/business/community as a whole is the login page, as pleasing and profitable as that fix is relative to the investment? Probably not a big %), and they are also finite (what happens when you have worked through your backlog of lowhanging fruit?).

I ask myself these same questions every workday. Are you cooking any new articles on this topic, Gwern? Reading your (thoroughly researched) thoughts often helps me clarify my own.