Is there a good technical breakdown of all these benchmarks that get used to market the latest greatest LLMs somewhere? Preferably impartial.

I just ask claude and ask for sources for each one.

Reminds me of how if you make a complaint against a lawyer or a judge it's evaluated by lawyers and judges.