Author of related work here. This is very cool! I was hoping that they would try to invert layer by layer from the output to the input but it seems that they do a search process at the input layer instead. They rightly point out the residual connections make a layer by layer approach difficult. I may point out though that an rmsnorm layer should be invertible due to the epsilon term in the denominator which can be used to recover the input magnitude

What is meant by "residual connections" here?