Apparently irrelevant data can help because model weights are entangled.