I think the techniques in “Weight Agnostic Neural Networks” should be applicable here, too. It uses a variant of NEAT I believe. This would allow for learning the topology and wiring rather than just gates. But, in practice it is probably pretty slow, and may not be all that different than a pruned and optimized DLGN..

https://weightagnostic.github.io/