There are LLM performance trackers in the wild, for instance https://marginlab.ai

You may notice that the performance of the old model tends to decline before each new model release.