That's not a problem, that is the argument. People are bad at measuring their own productivity. Just because you feel more productive with an LLM does not mean you are. We need more studies and less anecdata

I'm afraid all you're going to get from me is anecdata, but I find a lot of it very compelling.

I talk to extremely experienced programmers whose opinions I have valued for many years before the current LLM boom who are now flying with LLMs - I trust their aggregate judgement.

Meanwhile my own https://tools.simonwillison.net/colophon collection has grown to over 120 in just a year and a half, most of which I wouldn't have built at all - and that's a relatively small portion of what I've been getting done with LLMs elsewhere.

Hard to measure productivity on a "wouldn't exist" to "does exist" scale.

Every time you post about this stuff you get at least as much pushback as you get affirmation, and yet when you discuss anything related to peer responses, you never seem to mention or include any of that negative feedback, only the positive...

I don't get it, what are you asking me to do here?

You want me to say "this stuff is really useful, here's why I think that. But lots of people on the internet have disagreed with me, here's links to their comments"?

> my own https://tools.simonwillison.net/colophon collection has grown to over 120

What in the wooberjabbery is this even.

List of single-commit LLM generated stuff. Vibe coded shovelware like animated-rainbow-border [1] or unix-timestamp [2].

Calling these tools seems to be overstating it.

1: https://gist.github.com/simonw/2e56ee84e7321592f79ceaed2e81b...

2: https://gist.github.com/simonw/8c04788c5e4db11f6324ef5962127...

Cool right? It's my playground for vibe coded apps, except I started it nearly a year before the term "vibe coding" was introduced.

I wrote more about it here: https://simonwillison.net/2024/Oct/21/claude-artifacts/ - and a lot of them have explanations in posts under my tools tag: https://simonwillison.net/tags/tools/

It might also be the largest collection of published chat transcripts for this kind of usage from a single person - though that's not hard since most people don't publish their prompts.

Building little things like this is really effective way of gaining experience using prompts to get useful code results out of LLMs.

> Cool right?

100s of single commit AI generated trash in the likes of "make the css background blue".

On display.

Like it's something.

You can't be serious.

[flagged]

I've been using LLM-assistance for my larger open source projects - https://github.com/simonw/datasette https://github.com/simonw/llm and https://github.com/simonw/sqlite-utils - for a couple of years now.

Also literally hundreds of smaller plugins and libraries and CLI tools, see https://github.com/simonw?tab=repositories (now at 880 repos, though a few dozen of those are scrapers and shouldn't count) and https://pypi.org/user/simonw/ (340 published packages).

Unlike my tools.simonwillison.net stuff the vast majority of those products are covered by automated tests and usually have comprehensive documentation too.

What do you mean by my script?

The whole debate about LLMs and productivity consistently brings the "don't confuse movement with progress" warning to my mind.

But it was already a warning before LLMs because, as you wrote, people are bad at measuring productivity (among many things).