Reasoning 1 vs. 3 is the number of reasoning tokens between each "text" token. The 1 reasoning token is exactly what you see in the picture explanation in the article.

The generalization comes from making the network predict a <"start reasoning token"> and end the sequence only when it predicts a <"end reasoning token">. The training dataset for the upcoming experiment contains examples like: """ Q: What is 3+2? A: 3 + 2 is equal to <start reasoning> <reasoning> ... <reasoning> <end reasoning> 5 """

Wasting two tokens on start/end reasoning seems expensive to me (a priori)

I am curious what that would yield though - in some ways that would be the most fun to analyze (when does it think a lot??)

I would also be curious to see at what point you see diminishing returns from reasoning tokens (eg a 1:10 ratio? More?)