Have you tried providing multiple pages at a time to the model? It might do better transcription as it have bigger context to work with.

Gemini 3 long context is not good as Gemini 2.5