I have been thinking about this topic for some time. It might be done using the energy of the token. If it's still higher than an energy limit, then process it again, and increase the energy limit. The energy could be computed using log-sum-exp: https://openreview.net/pdf?id=Hkxzx0NtDB