uv add google-genai
uv run scripts/run_benchmarks.py --models google/gemini-2.5-pro --formats markdown_kv --limit 100
And add GOOGLE_API_KEY=<your-key-here> to a file called .env in the repo root.Unfortunately I started getting "quota exceeded" almost immediately, but it did give 6/6 correct answers before it crapped out.
Thanks! That worked perfectly.
100 samples:
- gemini-2.5-pro: 100%
- gemini-2.5-flash: 97%