Better tokens per dollar could be useless for comparison if the model can't solve your problem.