I wonder if some of those synthetics that specifically burn in attention inductive bias could help there - i.e. by getting attention to converge faster than it normally would?