Couldn't have said it better, although this is only for grokking with the modular addition task on networks with suitable architectures. L1 regularization is absolutely a clear form of compression. The modular addition example is one of the best cases to see the phenomenon in action.