Y
Hacker News
new | ask | show | jobs
redman25 3 months ago  [ - ]

https://www.swebench.com

https://swe-rebench.com

https://livebench.ai/#/

https://eqbench.com/#

https://contextarena.ai/?needles=8

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

https://artificialanalysis.ai/leaderboards/models

https://gorilla.cs.berkeley.edu/leaderboard.html

https://github.com/lechmazur/confabulations

https://dubesor.de/benchtable

https://help.kagi.com/kagi/ai/llm-benchmark.html

https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

Alifatisk 3 months ago  [ - ]

I’d stick to artificial analysis

pylotlight 3 months ago  [ - ]

That has many of its own problems as well.