>you'd need every single one of them, millions up millions of them, to be all zero

If they were all correlated with each other that does not seem far fetched.

Ok but it's already known that you shouldn't initialize your network parameters to a single constant and instead initialize the parameters with random numbers.

The model can converge towards such a state even if randomly initialized.

Both you and the comment above are correct; initializing with iid elements ensures that correlations are not disastrous for training, but strong correlations are baked into the weights during training, so pretty much anything could potentially happen.