This.
In fact, it can be slower because hardware is probably not optimized for the 1-bit case, so there may be a lot of low-hanging fruit for hardware designers and we may see improvements in the next iteration of hardware.
This.
In fact, it can be slower because hardware is probably not optimized for the 1-bit case, so there may be a lot of low-hanging fruit for hardware designers and we may see improvements in the next iteration of hardware.
Isn't digital (binary) hardware literally optimized for 1-bit case by definition?
People are confusing word size…
The CPU can handle up to word size bits at once. I believe they mean that a lot of assembly was written for integer math and not bit math. Word size 4+ However, it is unlikely we’ll see improvements in this area because by definition, using 64-bit floats uses max word size. So… that’s the max throughput. Sending 1 bit vs 64 bits would be considerably slower so this entire approach is funny.
No, because there are algorithmic shortcuts that allow approximations and skipped steps in comparison to a strict binary step-by-step calculation, by using in-memory bit reads and implicit rules, among other structural advantages in how GPUs and CPUs instruction sets are implemented in hardware.
FPGA's could be highly-competitive for models with unusual, but small, bit lengths. Especially single bits since their optimizers will handle that easily.