My read - 4.7 was a tactical lobotomy to improve the average experience at the expense of peak performance; necessary due to compute pressure.

Now that they have Colossus capacity, I guess they can tune up the intelligence again and spend more tokens on reasoning budgets.

4.7 was definitely a lot more flaky for me vs. 4.6 before the reasoning bugs.