Yes, I think training this model would be hard. Perhaps something akin to how MoEs are trained where you impose some sort of loss distribution to encourage equitable routing, but for recursion.
Yes, I think training this model would be hard. Perhaps something akin to how MoEs are trained where you impose some sort of loss distribution to encourage equitable routing, but for recursion.
Look at the human brain for useful analogies?
The default mode network does recursive/looping processing in the absence of external stimuli and world interaction. Multiple separate modules outside of the network are responsible for stopping and regulating this activity.