The notes explicitly call out you may want to dial the effort setting back to medium to reduce latency/tokens (high being default, apparently there is a max setting too).

There's 3 options to choose from on /model: Low, medium and high effort.