With Opus it’s hard to tell what was due to the tokenizer changes. Maybe using more tokens for the same prompt means the model effectively thinks more?