Differentiable Logic Gate Networks [0] are super interesting. However, I still don't like that the wiring is fixed initially rather than learned.

I did some extremely rough research into doing learnable wirings [1], but couldn't get past even learning ~4-bit addition.

[0]: https://arxiv.org/abs/2210.08277

[1]: https://ezb.io/thoughts/program_synthesis/boolean_circuits/2...

One of the easier solutions that does the rounds periodically (in many forms above and beyond logic gates, such as symbolic regression) is just densely connecting everything and ensuring that an identity function exists. Anneal a penalty around non-identity nodes, use L1 for the penalty, and you can learn a sparse representation.

There are a number of details to work through, such as making an "identity" for 2 inputs and 1 output (just don't offer those; use gates like a half adder instead of AND or XOR, adding a post-processing step removing extra wires you don't care about), or defining "densely connected" in a way that doesn't explode combinatorially (many solutions, details only matter a little), but it's the brute-force solution, and you only pay the cost during training.

There are lots of other fun ways to handle that problem though. One of my favorites is to represent your circuit as "fields" rather than discrete nodes. Choose your favorite representation for R2->Rn (could be a stack of grids, could be a neural net, who cares), and you conceptually represent the problem as a plane of wire density, a plane of "AND" density, a plane of "XOR" density, etc. Hook up the boundary conditions (inputs and outputs on the left and right side of the planes) and run your favorite differentiable PDE solver, annealing the discreteness of the wires and gates during training.

Ha! I have spent the last 2 years on this idea as a pet research project and have recently found a way of learning the wiring in a scalable fashion (arbitrary number of input bits, arbitray number of output bits). Would love to chat with someone also obsessed with this idea.

I also I'm very interested. I had played around a lot with Differentiable Logic Networks a couple of months ago and how to make the learned wiring scale to bigger number of gates. I had a couple of ideas that seemed to worked in a smaller scale, but that had trouble converging with deeper networks.

Also very interested. Do you have any code on github?

I think the techniques in “Weight Agnostic Neural Networks” should be applicable here, too. It uses a variant of NEAT I believe. This would allow for learning the topology and wiring rather than just gates. But, in practice it is probably pretty slow, and may not be all that different than a pruned and optimized DLGN..

https://weightagnostic.github.io/

To ruin it for everyone: They're also patented :) https://patents.google.com/patent/WO2023143707A1/en?inventor...

What's the innovation here?

Using logic operators? Picking something from a range of options with SoftMax? Having a distribution to pick from?

I remember reading about adaptive boolean logic networks in the 90's. I remember a paper about them using the phrase "Just say no to backpropagation". It probably goes back considerably earlier.

Fuzzy logic was all the rage in the 90's too. Almost at the level of marketers sticking the label on everything the way AI is done today. Most of that was just 'may contain traces of stochasticity' but the academic field used actual defined logical operators for interpolated values from zero to one.

A quick look on picking from a selection found https://psycnet.apa.org/record/1960-03588-000 but these days softmax is just about ubiquitous.

> What's the innovation here? > Having a distribution to pick from?

As I understand it, it's exactly this. Specifically, representing neurons in a neural network via a probability distribution of logic gates and then collapsing the distribution into the optimal logic gate for a given neuron via hyper-parameter tuning in the form of gradient descent. The author has a few more details in their thesis:

https://arxiv.org/abs/2209.00616

Specifically it's the training approach that's patented. I'm glad to see that people are trying to improve on his method, so the patent will likely become irrelevant in the future as better methods emerge.

The author also published an approach on applying their idea onto convolutional kernels in CNN's:

https://arxiv.org/abs/2411.04732

In the paper they promise to update their difflogic library with the resulting code, but apparently they seem to have conveniently forgotten to do this.

I also think their patent is too broad, but I guess it speaks for the entire ML community that we haven't seen more patents in this area. I could also imagine that, given that the approach promises some very impressive performance improvements, they're somewhat afraid that this will be used for embedded military applications.

Liquid NN are also able to generate decision trees.

My Zojirushi rice cooker says fuzzy logic on it, it's 15 years old, so that phrase was still marketed 15 years after "inception".