It's a neat idea. It's not too dissimilar in spirit from gradient boosting. The point about credit assignment is crucial, and that's the same reason most architectures and initialization methods are so convoluted nowadays.
I don't really like one of their premises and conclusions:
> that does not learn hierarchical representations
There's an implicit bias here that (a) traditional networks do learn hierarchial representations, (b) that's bad, and (c) this training method does not learn those. However, [a] is situational, and it's easy to construct datasets where a standard gradient-descent neural net will learn a different way, even with a reverse hierarchy. [b] is unproven and also doesn't make a lot of intuitive sense to me. [c], even in this paper where they make that claim, has no evidence and also doesn't seem likely to be true.