The thinking tokens (even just 1024) make a massive difference in real world tasks with 3.7 in my experience