Hacker News

Cool tool. I tried a few different things to get to work with google/gemini-2.5-pro, but couldn't figure it out.

    uv add google-genai
    uv run scripts/run_benchmarks.py --models google/gemini-2.5-pro --formats markdown_kv --limit 100

And add GOOGLE_API_KEY=<your-key-here> to a file called .env in the repo root.

Unfortunately I started getting "quota exceeded" almost immediately, but it did give 6/6 correct answers before it crapped out.

xnx 2 days ago [ - ]

Thanks! That worked perfectly.

100 samples:

- gemini-2.5-pro: 100%

- gemini-2.5-flash: 97%