yes, and even this search doesn't actually require trillions of parameters, since the switching parameters will be sparse, which means you can apply a FakeParameter trick: suppose I want a trillion sparse parameters, thats a million by a million. Let's just model those parameters as inner products of a million vectors each of some dimension N. Now its in the regime of megabytes or a GB.

For extreme regularization, one can even go down to 10 arbitrary precision numbers: if we have a single vector of 10 dimensions, we can re-order the components 10! different ways.

10! = 3 628 800

so we can retrieve ~3M vectors from it, and we can form about 10 T inner products.