Interesting that Simon declared the pelican dead when qwen 27B overtook opus 4.7. That seems a strange criteria to decide the utility of a benchmark, without more proof. I think it stems from the assumption that opus must be much larger. But I suspect that active parameters are more important than total parameters, and it is possible that new opus is a very sparse moe with close to 27B active params.

  "there has been a direct correlation between the quality of the pelicans produced and the general usefulness of the models ...
 
  Today, even that loose connection to utility has been broken..." 
https://simonwillison.net/2026/Apr/16/qwen-beats-opus/