wow, not bad result on the computer use benchmark for the mini model. for example, Claude Sonnet 4.6 shows 72.5%, almost on par with GPT-5.4 mini (72.1%). but sonnet costs 4x more on input and 3x more on output
wow, not bad result on the computer use benchmark for the mini model. for example, Claude Sonnet 4.6 shows 72.5%, almost on par with GPT-5.4 mini (72.1%). but sonnet costs 4x more on input and 3x more on output