Cool tool. I tried a few different things to get to work with google/gemini-2.5-pro, but couldn't figure it out.
uv add google-genai uv run scripts/run_benchmarks.py --models google/gemini-2.5-pro --formats markdown_kv --limit 100
Unfortunately I started getting "quota exceeded" almost immediately, but it did give 6/6 correct answers before it crapped out.
Thanks! That worked perfectly.
100 samples:
- gemini-2.5-pro: 100%
- gemini-2.5-flash: 97%
Unfortunately I started getting "quota exceeded" almost immediately, but it did give 6/6 correct answers before it crapped out.
Thanks! That worked perfectly.
100 samples:
- gemini-2.5-pro: 100%
- gemini-2.5-flash: 97%