With 1.58-bit ternary quantization, you may think you're running a big model but really you're just running a "mini" version of it
With 1.58-bit ternary quantization, you may think you're running a big model but really you're just running a "mini" version of it