I don't believe that trackers like this are trustworthy. There's an enormous financial motive to cheat and these companies have a track record of unethical conduct.

If I was VP of Unethical Business Strategy at OpenAI or Anthropic, the first thing I'd do is put in place an automated system which flags accounts, prompts, IPs, and usage patterns associated with these benchmarks and direct their usage to a dedicated compute pool which wouldn't be affected by these changes.