This confused me at first as well.. inactive experts skip compute, but weights are sill loaded. So memory does not shrink at all.
I found this visualisation helpful - https://vectree.io/c/sparse-activation-patterns-and-memory-e...
This confused me at first as well.. inactive experts skip compute, but weights are sill loaded. So memory does not shrink at all.
I found this visualisation helpful - https://vectree.io/c/sparse-activation-patterns-and-memory-e...