I am surprised it is not tested on newer models - GPT5, Gemini 2.5 Pro, Claude 4 Sonnet and Opus.