Not quite as I understand it. The ternary approach bonsai uses leverages a FP16 scaling factor that each value in the ternary maps to. You're still using 16 bit multiplication, it's just that the weights are far more compressed.
fair, i think i was referring more to 1.58 bit architecture in general since the original paper (Figure 3) shows that we eliminate FP16 multiplication and addition just for INT8 addition. I need to dive deeper into bonsai overall if it differs
Not quite as I understand it. The ternary approach bonsai uses leverages a FP16 scaling factor that each value in the ternary maps to. You're still using 16 bit multiplication, it's just that the weights are far more compressed.
fair, i think i was referring more to 1.58 bit architecture in general since the original paper (Figure 3) shows that we eliminate FP16 multiplication and addition just for INT8 addition. I need to dive deeper into bonsai overall if it differs
https://arxiv.org/pdf/2402.17764