I've gotten better results by telling it "write a Python program to calculate X"

For the tax thing. I had Claude write a CLI and a prompt for Gemini Flash 2.5 to do the structured extraction: i.e. .pdf -> JSON. The JSON schema was pretty flexible, and open to interpretation by Gemini, so it didn't produce 100% consistent JSON structures.

To then "aggregate" all of the json outputs, I had Claude look at the json outputs, and then iterate on a Python tool to programmatically do it. I saw it iterating a few times on this: write the most naive Python tool, run it, throws exception, rinse and repeat, until it was able to parse all the json files sensibly.

Yeah, in my user prompt I have "Whenever you are asked to perform any operation which could be done deterministically by a program, you should write a program to do it that way and feed it the data, rather than thinking through the problem on your own." It's worked wonders.

[deleted]

Good call. I’ve also had better results pre-processing PDFs, extracting data into structured format, and then running prompts against that.

Which should pair well with the “write a script” tactic.

Yeah, asking for a tool to do a thing is almost always better than asking for the thing directly, I find. LLMs are kind of not there in terms of always being correct with large batches of data. And when you ask for a script, you can actually verify what's going on in there, without taking leaps of faith.