Hacker News

charleshmartin a day ago [ - ]

Right. If the dynamics of training are governed by RG flow, then the best optimization path should remove redundant directions, as specified by the RG operator(s)