The key here is that you used screenshots. This forces Gemini into "OCR mode" (i.e. actually looking at vision tokens) rather than trying to be clever with its tool calls.

The latter strategy almost entirely depends on the quality of the skills and tool calls exposed to a given agent.