> Mistakes are rampant in spreadsheets

To me, the case for LLMs is strongest not because LLMs are so unusually accurate and awesome, but because if human performance were put on trial in aggregate, it would be found wanting.

Humans already do a mediocre job of spreadsheets, so I don't think it is a given that Claude will make more mistakes than humans do.

But isn't this only fine as long someone who knows what they are doing has oversight and can fix issues when they arise and Claude gets stuck?

Once we all forget how to write SUM(A:A), will we just invent a new kind of spreadsheet once Claude gets stuck?

Or in other words; what's the end game here? LLMs clearly cannot be left alone to do anything properly, so what's the end game of making people not learn anything anymore?

Well the end game with AI is AGI of course. But realistically the best case scenario with LLM’s is having fewer people with the required knowledge, leveraging LLM’s to massively enhance productivity.

We’re already there to some degree. It is hard to put a number on my productivity gain, but as a small business owner with a growing software company it’s clear to me already that I can reduce developer hiring going forward.

When I read the skeptics I just have to conclude that they’re either poor at context building and/or work on messy, inconsistent and poorly documented projects.

My sense is that many weaker developers who can’t learn these tools simply won’t compete in the new environment. Those who can build well designed and documented projects with deep context easy for LLM’s to digest will thrive.

I assume all of this applies to spreadsheets.

Why isn't there a single study that would back up your observations? The only study with a representative experimental design that I know about is the METR study and it showed the opposite. Every study citing significant productivity improvements that I've seen is either:

- relying on self-assessments from developers about how much time they think they saved, or

- using useless metrics like lines of code produced or PRs opened, or

- timing developers on toy programming assignments like implementing a basic HTTP server that aren't representative of the real world.

Why is it that any time I ask people to provide examples of high quality software projects that were predominantly LLM-generated (with video evidence to document the process and allow us to judge the velocity), nobody ever answers the call? Would you like to change that?

My sense is that weaker developers and especially weaker leaders are easily impressed and fascinated by substandard results :)

Everything Claude does is reviewed by me, nothing enters the code base that doesn’t meet the standard we’ve always kept. Perhaps I’m sub standard and weak but my software is stable, my customers are happy, and I’m delivering value to them quicker than I was previously.

I don’t know how you could effectively study such a thing, that avenue seems like a dead end. The truth will become obvious in time.

Okay, and now you give those mediocre humans a tool hat is both great and terrible. The problem is, unless you know your way around very well, they won't know which is which.

Since my company uses Excel a lot, and I know the basics but don't want to become an expert, I use LLMs to ask intermediate questions, too hard to answer with the few formulas I know, not too hard for a short solution path.

I have great success and definitely like what I can get with the Excel/LLM combo. But if my colleagues used it the same way, they would not get my good results, which is not their fault, they are not IT but specialists, e.g. for logistics. The best use of LLMs is if you could already do the job without them, but it saves you time to ask them and then check if the result is actually acceptable.

Sometimes I abandon the LLM session, because sometimes, and it's not always easy to predict, fixing the broken result would take more effort than just doing it the old way myself.

A big problem is that the LLMs are so darn confident and always present a result. For example, I point it to a problem, it "thinks", and then it gives me new code and very confidently summarizes what the problem was, correctly, that it now for sure fixed the problem. Only that when I actually try the result has gotten worse than before. At that point I never try to get back to a working solution by continuing to try to "talk" to the AI, I just delete that session and do another, non-AI approach.

But non-experts, and people who are very busy and just want to get some result to forward to someone waiting for it as quickly as possible will be tempted to accept the nice looking and confidently presented "solution" as-is. And you may not find a problem until half a year later somebody finds that prepayments, pro forma bills and the final invoices don't quite match in hard to follow ways.

Not that these things don't happen now already, but adding a tool with erratic results might increase problems, depending on actual implementation of the process. Which most likely won't be well thought out, many will just cram in the new tool and think it works when it doesn't implode right away, and the first results, produced when people still pay a lot of attention and are careful, all look good.

I am in awe of the accomplishments of this new tool, but it is way overhyped IMHO, still far too unpolished and random. Forcing all kinds of processes and people to use it is not a good match, I think.

This is a great point. LLMs make good developers better, but they make bad developers even worse. LLMs multiply instead of add value. So if you're a good developer, who is careful, pays attention, watches out for trouble, and is constantly reviewing and steering, the LLM is multiplying by a positive number and will make you better. However, if you're a mediocre/bad developer, who is not careful, who lacks attention to detail, and just barely gets things to compile / run, then the LLM is multiplying by a negative number and will make your output even worse.