newer models tend to use fewer thinking tokens to solve the same problems, and is a strong counterexample to your entire comment