"This won't work because (something obvious that engineers at Anthropic clearly thought of already)"

Not really. Take for example:

item, date, price

abc, 01/01/2023, $30

cde, 02/01/2023, $40

... 100k rows ...

subtotal. $1000

def, 03/01,2023, $20

"Hey Claude, what's the total from this file? > grep for headers > "Ah, I see column 3 is the price value" > SUM(C2:C) -> $2020 > "Great! I found your total!"

If you can find me an example of tech that can solve this at scale on large, diverse Excel formats, then I'll concede, but I haven't found something actually trustworthy for important data sets

That's a basic tool call that current models already can do well. All the sql query generation LLMs can do this for example.