Seems like you're very confused about what this work typically entails. The job of these employees is not mental arithmatic. It's closer to:

- Log in to the internal system that handles customer policies

- Find all policies that were bound in the last 30 days

- Log in to the internal system that manages customer payments

- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.

- Flag any divergences above X% for accounting/finance to follow up on.

Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.

Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.

> “Does it have to be perfect?”

Actually, yes. This kind of management reporting is either (1) going to end up in the books and records of the company - big trouble if things have to be restated in the future or (2) support important decisions by leadership — who will be very much less than happy if analysis turns out to have been wrong.

A lot of what ties up the time of business analysts is ticking and tying everything to ensure that mistakes are not made and that analytics and interpretations are consistent from one period to the next. The math and queries are simple - the details and correctness are hard.

Is this not belligerently ignoring the fact that this work is already done imperfectly? I can’t tell you how many serious errors I’ve caught in just a short time of automating the generation of complex spreadsheets from financial data. All of them had already been checked by multiple analysts, and all of them contained serious errors (in different places!)

No belligerence intended! Yes, processes are faulty today even with maker-checker and other QA procedures. To me it seems the main value of LLMs in a spreadsheet-heavy process is acceleration - which is great! What is harder is quality assurance - like the example someone gave regarding deciding when and how to include or exclude certain tables, date ranges, calc, etc. Properly recording expert judgment and then consistently applying that judgement over time is key. I’m not sure that is the kind of thing LLMs are great at, even ignoring their stochastic nature. Let’s figure out how to get best use out of the new kit - and like everything else, focus on achieving continuously improving outcomes.

There’s actually different classes of errors though. There’s errors in the process itself versus errors that happen when performing the process.

For example, if I ask you to tabulate orders via a query but you forgot to include an entire table, this is a major error of process but the query itself actually is consistently error-free.

Reducing error and mistakes is very much modeling where error can happen. I never trust an LLM to interpret data from a spreadsheet because I cannot verify every individual result, but I am willing to ask an LLM to write a macro that tabulates the data because I can verify the algorithm and the macro result will always be consistent.

Using Claude to interpret the data directly for me is scary because those kinds of errors are neither verifiable nor consistent. At least with the “missing table” example, that error may make the analysis completely bunk but once it is corrected, it is always correct.

Very much agreed

Speak for yourself and your own use cases. There are a huge diversity of workflows with which to apply automation in any medium to large business. They all have differing needs. Many excel workflows I'm personally familiar with already incoporate a "human review" step. Telling a business leader that they can now jump straight to that step, even if it requires 2x human review, with AI doing all of the most tediuous and low-stakes prework, is a clear win.

>Speak for yourself and your own use cases

Take your own advice.

I'm taking a much weaker position than the respondent: LLMs are useful for many classes of problem that do not require zero shot perfect accuracy. They are useful in contexts where the cost of building scaffolding around them to get their accuracy to an acceptable level is less than the cost of hiring humans to do the same work to the same degree of accuracy.

This is basic business and engineering 101.

>LLMs are useful for many classes of problem that do not require zero shot perfect accuracy. They are useful in contexts where the cost of building scaffolding around them to get their accuracy to an acceptable level is less than the cost of hiring humans to do the same work to the same degree of accuracy.

Well said. Concise and essentially inarguable, at least to the extent it means LLMs are here to stay in the business world whether anyone likes it or not (barring the unforeseen, e.g. regulation or another pressure).

There is another aspect to this kind of activity.

Sometimes there can be an advantage in leading or lagging some aspects of internal accounting data for a time period. Basically sitting on credits or debits to some accounts for a period of weeks. The tacit knowledge to know when to sit on a transaction and when to action it is generally not written down in formal terms.

I'm not sure how these shenanigans will translate into an ai driven system.

> Sometimes there can be an advantage in leading or lagging some aspects of internal accounting data for a time period.

This worked famously well for Enron.

That’s the kind of thing that can get a company into a lot of trouble with its auditors and shareholders. Not that I am offering accounting advice of course. And yeah, one can not “blame” and ai system or try to ai-wash any dodgy practices.

[deleted]

Checking someone elses spreadsheet is a fucking nightmare. If your company has extremely good standards it's less miserable because at least the formatting etc will be consistent...

The one thing LLMs should consistently do is ensure that formatting is correct. Which will help greatly in the checking process. But no, I generally don't trust them to do sensible things with basic formulation. Not a week ago GPT 5 got confused whether a plus or a minus was necessary in a basic question of "I'm 323 days old, when is my birthday?"

I think you have a misunderstanding of the types of things that LLMs are good at. Yes you're 100% right that they can't do math. Yet they're quite proficient at basic coding. Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.

My concern would be more with how to check the work (ie, make sure that the formulas are correct and no columns are missed) because Excel hides all that. Unlike code, there's no easy way to generate the diff of a spreadsheet or rely on Git history. But that's different from the concerns that you have.

> Yes you're 100% right that they can't do math.

The model ought to be calling out to some sort of tool to do the math—effectively writing code, which it can do. I'm surprised the major LLM frontends aren't always doing this by now.

MS Office Tools menu has a "Spreadsheet Compare" application. It is quite good for diffing 2 spreadsheets. Of course it cannot catch logic errors, human or ML.

I've built spreadsheet diff tools on Google sheets multiple times. As the needs grows I think we will see diffs and commits and review tools reach customers

hey Collin! I am working on an AI agent on Google Sheets, I am curious if any of your designs are out in the public. We are trying to re-think how diffs should look like and want to make something nicer than what we currently have, so curious.

Hi! Nothing public nor generic enough to be a good building block. I found myself often frustrated by the tools that came out of the box but I believe better apis could make this slightly easier to solve.

The UX of spreadsheet diffs is a hard one to solve because of how weird the calculation loops are and how complicated the relationship between fields might be.

I've never tried to solve this for a real end user before in a generic way - all my past work here was for internal ability to audit changes and rollback catastrophes. I took a lot of shortcuts by knowing which cells are input data vs various steps of calculations -- maybe part of your ux is being able to define that on a sheet by sheet basis? Then you could show how different data (same formulas) changed outputs or how different formulas (same data) did differently?

Spreadsheets are basically weird app platforms at this point so you might not be able to create a single experience that is both deep and generic. On the other hand maybe treating it as an app is the unlock? Get your AI to noodle on what the whole thing is for, then show diff between before and after stable states (after all calculation loops stabilize or are killed) side by side with actual diffs of actual formulas? I feel like Id want to see a diff as a live final spreadsheet and be able to click on changed cells and see up the chain of their calculations to the ancestors that were modified.

Fun problem that sounds extremely complicated. Good luck distilling it!

> Most Excel work is similar to basic coding

Excel is similar to coding in BASIC, a giant hairy ball of tangled wool.

So do it in basic code where numbering your line G53 instead of G$53 doesn't crash a mass transit network because somebody's algorithm forgot to order enough fuel this month.

proficient != near-flawless.

> Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.

This is a hot take. One I'm not sure many would agree with.

Excel work of people who make a living because of their excel skills (Bankers, VCs, Finance pros) is truly on the spectrum of basic coding. Excel use by others (Strategy, HR, etc.) is more like crude UI to manipulate small datasets (filter, sort, add, share and collaborate). Source: have lived both lives.

Maybe LLMs will enable a new type of work in spreadsheets. Just like in coding we have PR reviews, with an LLM it should be possible to do a spreadsheet review. Ask the LLM to try to understand the intent and point out places where the spreadsheet deviates from the intent. Also ask the LLM to narrate the spreadsheet so it can be understood.

That first condition "try to understand the intent" is where it could go wrong. Maybe it thinks the spreadsheet aligns with the intent, but it misunderstood the intent.

LLMs are a lossy validation, and while they work sometimes, when they fail they usually do so 'silently'.

Maybe we need some kind of method, framework to develop intent. Most of things that go wrong in knowledge working are down to lack of common understanding of intent.

> The one thing LLMs should consistently do is ensure that formatting is correct.

In JavaScript (and I assume most other programming languages) this is the job of static analysis tools (like eslint, prettier, typescript, etc.). I’m not aware of any LLM based tools which performs static analysis with as good a results as the traditional tools. Is static analysis not a thing in the spreadsheet world? Are there the tools which do static analysis on spreadsheets subpar, or offer some disadvantage not seen in other programming languages? And if so, are LLMs any better?

Just use a normal static analysis tool and shove the result to an LLM. I believe Anthropic properly figured that agents are the key, in addition to models, contrary to OpenAI that is run by a psycho that only believes in training the bigger model.

[dead]

Last time, I gave claude an invoice and asked it to change one item on it, it did so nicely and gave me the new invoice. Good thing I noticed it had also changed the bank account number..

The more complicated the spreadsheet and the more dependencies it has, the greater the room for error. These are probabilistic machines. You can use them, I use them all the time for different things, but you need to treat them like employees you can't even trust to copy a bank account number correctly.

We’ve tried to gently use them to automate some of our report generation and PDF->Invoice workflows and it’s a nightmare of silent changes and absence of logic.. basic things like specifically telling it “debits need to match credits” and “balance sheets need to balance” that are ignored.

Yeah, asking llm to edit one specific thing in a large or complex document/ codebase is like those repeated "give me the exact same image" gifs. It's fundamentally a statistical model so the only thing we can be _certain_ of is that _it's not_. It might get the desired change 100% correct but it's only gonna get the entire document 99 5%

Something that Claude Sonnet does when you use it to code is write scripts to test whether or not something is working. If it does that for Excel (e.g. some form of verification) it should be fine.

Besides, using AI is an exercise in a "trust but verify" approach to getting work done. If you asked a junior to do the task you'd check their output. Same goes for AI.

The use cases for spreadsheets are much more diverse than that. In my experience, spreadsheets just as often used for calculation. Many of them do require high accuracy, rely on determinism, and necessitate the understanding of maths ranging from basic arithmetic to statistics and engineering formulas. Financial models, for example, must be built up from ground truth and need to always use the right formulas with the right inputs to generate meaningful outputs.

I have personally worked with spreadsheet based financial models that use 100k+ rows x dozens of columns and involve 1000s of formulas that transform those data into the desired outputs. There was very little tolerance for mistakes.

That said, humans, working in these use cases, make mistakes >0% of the time. The question I often have with the incorporation of AI into human workflows is, will we eventually come to accept a certain level of error from them in the way we do for humans?

Sysadmin of a small company. I get asked pretty often to help with a pivot table, vlookup, or just general excel functions (and smartsheet, these users LOVE smartsheet)

Indeed, in a small enough org, the sysadmin/technologist becomes support of last resort for all the things.

> these users LOVE smartsheet

I hate smartsheet…

Excel or R. (Or more often, regex followed by pen and paper followed by more regex.)

They're coming to me for pivot tables....

Handing them regex would be like giving a monkey a bazooka

>Does it have to be perfect? Also no.

Yeah, but it could be perfect, why are there humans in the loop at all? That is all just math!