I’ve been using Gemini 3 Pro on a historical document archiving project for an old club. One of the guys had been working on scanning old handwritten minutes books written in German that were challenging to read (1885 through 1974). Anyways, I was getting decent results on a first pass with 50 page chunks but ended up doing 1 page at a time (accuracy probably 95%). For each page, I submit the page for a transcription pass followed by a translation of the returned transcription. About 2370 pages and sitting at about $50 in Gemini API billing. The output will need manual review, but the time savings is impressive.

Suggestion: run the identical prompt N times (2 identical calls to Gemini 3.0 Pro + 2 identical calls to GPT 5.2 Thinking), then running some basic text post-processing to see where the 4 responses agree vs disagree. The disagreements (substrings that aren't identical matches) are where scrutiny is needed. But if all 4 agree on some substring it's almost certainly a correct transcription. Wouldn't be too hard to get codex to vibe code all this.

Have you tried providing multiple pages at a time to the model? It might do better transcription as it have bigger context to work with.

Gemini 3 long context is not good as Gemini 2.5

It sounds like a job where one pass might also be a viable option. Until you do the manual review you won't have a full sense of the time savings involved.

Good idea. I’ll try modifying the prompt to transcribe, identify the language, and translate if not English, and then return a structured result. In my spot checks, most of the errors are in people’s names and if the handwriting trails into margins (especially into the fold of the binding). Even with the data still needing review, the translations from it has revealed a lot of interesting characters as well as this little anecdote from the minutes of the June 6, 1941 Annual Meeting:

It had already rained at the beginning of the meeting. During the same, however, a heavy thunderstorm set in, whereby our electric light line was put out of operation. Wax candles with beer bottles as light holders provided the lighting. In the meantime the rain had fallen in a cloudburst-like manner, so that one needed help to get one's automobile going. In some streets the water stood so high that one could reach one's home only by detours. In this night 9.65 inches of rain had fallen.

One discovery I've made with gemini is that ocr accuracy is much higher when document is perfectly aligned at 0 degree. When we provided images with handwritten text to gemini which were horizontal (90 or 180 degree) it had lots of issues reading dates, names etc. Then we used paddle ocr image orientation model to find orientation and rotate the image it solved most of our issues with ocr.

They could likely increase their budget slightly and run an LLM-based judge.