At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5, the interesting question is which evals will still be meaningful in 6 months.

more like weekly or almost daily, gpt 5.5 was literally 12 hours ago