Hacker News

Recently read about some upcoming GPU chip vendor that speeds up multiply-and-add (the core of [pretty much any neural network, including LLM's) oepration by transforming them to log scale where the multiplication changes to addition.

There issue was the size of such log tables that would be needed, and hence, they settled for some trade-off.

My question is: can the expensive log-domain addition / correction function be implemented as fixed hardware lookup tables or approximate units?

Update: found the vendor: https://www.tensordyne.ai/silicon-and-math