Hacker News

c0rruptbytes 2 hours ago [ - ]

ideally if ternary models work, the math is extremely easy for computers (addition/subtraction vs 16 bit multiplication)

jjcm 2 hours ago [ - ]

Not quite as I understand it. The ternary approach bonsai uses leverages a FP16 scaling factor that each value in the ternary maps to. You're still using 16 bit multiplication, it's just that the weights are far more compressed.

c0rruptbytes 2 hours ago [ - ]

fair, i think i was referring more to 1.58 bit architecture in general since the original paper (Figure 3) shows that we eliminate FP16 multiplication and addition just for INT8 addition. I need to dive deeper into bonsai overall if it differs

https://arxiv.org/pdf/2402.17764