MoE is fine. You can put the shared weights on the 5090 (will fit handily even for the largest models) and expert weights on CPU, possibly with weights offload from storage.
MoE is fine. You can put the shared weights on the 5090 (will fit handily even for the largest models) and expert weights on CPU, possibly with weights offload from storage.