The idea is that instead of assigning 10,000 thinking tokens to one chain of thought, assigning 1,000 thinking tokens to 10 chains of thought and composing those independent outputs into a single output yields better results.
The fact that it can be done in parallel is just a bonus.