Well done — really enjoyed this. We could use this kind of optimization in our library[0], which builds differentiable logic networks out of gates like AND, XOR, etc.
It focuses on training circuit-like structures via gradient descent using soft logic semantics. The idea of compiling trained models down to efficient bit-parallel C is exactly the kind of post-training optimization we’ve been exploring — converting soft gates back into hard boolean logic (e.g. by thresholding or symbolic substitution), then emitting optimized code for inference (C, WASM, HDL, etc).
The Game of Life kernel is a great example of where logic-based nets really shine.