One of the easier solutions that does the rounds periodically (in many forms above and beyond logic gates, such as symbolic regression) is just densely connecting everything and ensuring that an identity function exists. Anneal a penalty around non-identity nodes, use L1 for the penalty, and you can learn a sparse representation.

There are a number of details to work through, such as making an "identity" for 2 inputs and 1 output (just don't offer those; use gates like a half adder instead of AND or XOR, adding a post-processing step removing extra wires you don't care about), or defining "densely connected" in a way that doesn't explode combinatorially (many solutions, details only matter a little), but it's the brute-force solution, and you only pay the cost during training.

There are lots of other fun ways to handle that problem though. One of my favorites is to represent your circuit as "fields" rather than discrete nodes. Choose your favorite representation for R2->Rn (could be a stack of grids, could be a neural net, who cares), and you conceptually represent the problem as a plane of wire density, a plane of "AND" density, a plane of "XOR" density, etc. Hook up the boundary conditions (inputs and outputs on the left and right side of the planes) and run your favorite differentiable PDE solver, annealing the discreteness of the wires and gates during training.