It is kind of noisy because the release recency, which is what your "age" column actually represents, is not important data for the comparison you are trying to make.

Also what message we should get from that table is not really obvious.

Okay I think there's a familiarity delta. I constantly run into this

I know artificial analysis quite well as the gold standard in llm evals.

But I guess they're still obscure

I didn't think they were.

The age is important because new techniques keep being developed and so it is a very rough indicator of the size/cost/efficiency trade-off.

How old a model is is a major indicator of what you can expect from it.

I really need to develop a better sense for what people know. That's only one of my problems

Thanks for engaging with me