Checking someone elses spreadsheet is a fucking nightmare. If your company has extremely good standards it's less miserable because at least the formatting etc will be consistent...

The one thing LLMs should consistently do is ensure that formatting is correct. Which will help greatly in the checking process. But no, I generally don't trust them to do sensible things with basic formulation. Not a week ago GPT 5 got confused whether a plus or a minus was necessary in a basic question of "I'm 323 days old, when is my birthday?"

I think you have a misunderstanding of the types of things that LLMs are good at. Yes you're 100% right that they can't do math. Yet they're quite proficient at basic coding. Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.

My concern would be more with how to check the work (ie, make sure that the formulas are correct and no columns are missed) because Excel hides all that. Unlike code, there's no easy way to generate the diff of a spreadsheet or rely on Git history. But that's different from the concerns that you have.

> Yes you're 100% right that they can't do math.

The model ought to be calling out to some sort of tool to do the math—effectively writing code, which it can do. I'm surprised the major LLM frontends aren't always doing this by now.

MS Office Tools menu has a "Spreadsheet Compare" application. It is quite good for diffing 2 spreadsheets. Of course it cannot catch logic errors, human or ML.

I've built spreadsheet diff tools on Google sheets multiple times. As the needs grows I think we will see diffs and commits and review tools reach customers

hey Collin! I am working on an AI agent on Google Sheets, I am curious if any of your designs are out in the public. We are trying to re-think how diffs should look like and want to make something nicer than what we currently have, so curious.

Hi! Nothing public nor generic enough to be a good building block. I found myself often frustrated by the tools that came out of the box but I believe better apis could make this slightly easier to solve.

The UX of spreadsheet diffs is a hard one to solve because of how weird the calculation loops are and how complicated the relationship between fields might be.

I've never tried to solve this for a real end user before in a generic way - all my past work here was for internal ability to audit changes and rollback catastrophes. I took a lot of shortcuts by knowing which cells are input data vs various steps of calculations -- maybe part of your ux is being able to define that on a sheet by sheet basis? Then you could show how different data (same formulas) changed outputs or how different formulas (same data) did differently?

Spreadsheets are basically weird app platforms at this point so you might not be able to create a single experience that is both deep and generic. On the other hand maybe treating it as an app is the unlock? Get your AI to noodle on what the whole thing is for, then show diff between before and after stable states (after all calculation loops stabilize or are killed) side by side with actual diffs of actual formulas? I feel like Id want to see a diff as a live final spreadsheet and be able to click on changed cells and see up the chain of their calculations to the ancestors that were modified.

Fun problem that sounds extremely complicated. Good luck distilling it!

> Most Excel work is similar to basic coding

Excel is similar to coding in BASIC, a giant hairy ball of tangled wool.

So do it in basic code where numbering your line G53 instead of G$53 doesn't crash a mass transit network because somebody's algorithm forgot to order enough fuel this month.

proficient != near-flawless.

> Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.

This is a hot take. One I'm not sure many would agree with.

Excel work of people who make a living because of their excel skills (Bankers, VCs, Finance pros) is truly on the spectrum of basic coding. Excel use by others (Strategy, HR, etc.) is more like crude UI to manipulate small datasets (filter, sort, add, share and collaborate). Source: have lived both lives.

Maybe LLMs will enable a new type of work in spreadsheets. Just like in coding we have PR reviews, with an LLM it should be possible to do a spreadsheet review. Ask the LLM to try to understand the intent and point out places where the spreadsheet deviates from the intent. Also ask the LLM to narrate the spreadsheet so it can be understood.

That first condition "try to understand the intent" is where it could go wrong. Maybe it thinks the spreadsheet aligns with the intent, but it misunderstood the intent.

LLMs are a lossy validation, and while they work sometimes, when they fail they usually do so 'silently'.

Maybe we need some kind of method, framework to develop intent. Most of things that go wrong in knowledge working are down to lack of common understanding of intent.

> The one thing LLMs should consistently do is ensure that formatting is correct.

In JavaScript (and I assume most other programming languages) this is the job of static analysis tools (like eslint, prettier, typescript, etc.). I’m not aware of any LLM based tools which performs static analysis with as good a results as the traditional tools. Is static analysis not a thing in the spreadsheet world? Are there the tools which do static analysis on spreadsheets subpar, or offer some disadvantage not seen in other programming languages? And if so, are LLMs any better?

Just use a normal static analysis tool and shove the result to an LLM. I believe Anthropic properly figured that agents are the key, in addition to models, contrary to OpenAI that is run by a psycho that only believes in training the bigger model.

[dead]