Did the ArtificialAnalysis team get bored or something? What makes a model worthy of benchmark inclusion?